Search Results for author: Xinlei Chen

Found 54 papers, 38 papers with code

Massive Activations in Large Language Models

1 code implementation27 Feb 2024 MingJie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu

We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e. g., 100, 000 times larger).

Revisiting Feature Prediction for Learning Visual Representations from Video

1 code implementation arXiv preprint 2024 Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann Lecun, Mahmoud Assran, Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

1 code implementation25 Jan 2024 Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He

In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation.

Denoising Image Generation +3

Learning to (Learn at Test Time)

1 code implementation20 Oct 2023 Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen

Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator.

Test-Time Training on Video Streams

no code implementations11 Jul 2023 Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.

Image Reconstruction Panoptic Segmentation

R-MAE: Regions Meet Masked Autoencoders

1 code implementation8 Jun 2023 Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.

Contrastive Learning Interactive Segmentation +4

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

10 code implementations CVPR 2023 Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.

Object Detection Representation Learning +2

Exploring Long-Sequence Masked Autoencoders

1 code implementation13 Oct 2022 Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.

Object Detection Segmentation +1

Test-Time Training with Masked Autoencoders

1 code implementation15 Sep 2022 Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.

On the Importance of Asymmetry for Siamese Representation Learning

1 code implementation CVPR 2022 Xiao Wang, Haoqi Fan, Yuandong Tian, Daisuke Kihara, Xinlei Chen

Many recent self-supervised frameworks for visual representation learning are based on certain forms of Siamese networks.

Representation Learning

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations22 Nov 2021 Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

2 code implementations11 Oct 2021 Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian

Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image.

Representation Learning Self-Supervised Learning

NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

1 code implementation ICLR 2022 Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra

In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.

Data Augmentation Image Classification +2

A Data-Efficient Approach to Behind-the-Meter Solar Generation Disaggregation

no code implementations17 May 2021 Xinlei Chen, Moosa Moghimi Haji, Omid Ardakanian

With the emergence of cost effective battery storage and the decline in the solar photovoltaic (PV) levelized cost of energy (LCOE), the number of behind-the-meter solar PV systems is expected to increase steadily.

Non-Intrusive Load Monitoring

Understanding self-supervised Learning Dynamics without Contrastive Pairs

5 code implementations12 Feb 2021 Yuandong Tian, Xinlei Chen, Surya Ganguli

While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.

Self-Supervised Learning

Exploring Simple Siamese Representation Learning

26 code implementations CVPR 2021 Xinlei Chen, Kaiming He

Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.

Representation Learning Self-Supervised Image Classification

Understanding Self-supervised Learning with Dual Deep Networks

2 code implementations1 Oct 2020 Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).

Self-Supervised Learning

Overcoming Statistical Shortcuts for Open-ended Visual Counting

1 code implementation17 Jun 2020 Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord

First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts.

MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond

1 code implementation ICLR 2021 Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen

This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e. g. a question or a category).

Object Counting Question Answering +1

Improved Baselines with Momentum Contrastive Learning

36 code implementations9 Mar 2020 Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Contrastive Learning Data Augmentation +3

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

1 code implementation CVPR 2020 Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas

Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images.

Ranked #2 on 3D Object Detection on SUN-RGBD (using extra training data)

3D Object Detection object-detection +1

In Defense of Grid Features for Visual Question Answering

2 code implementations CVPR 2020 Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen

Popularized as 'bottom-up' attention, bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional features as the de facto standard for vision and language tasks like visual question answering (VQA).

Image Captioning Question Answering +1

Embodied Visual Recognition

no code implementations9 Apr 2019 Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.

Object Object Localization +1

Multi-Target Embodied Question Answering

1 code implementation CVPR 2019 Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Navigate +1

TensorMask: A Foundation for Dense Object Segmentation

2 code implementations ICCV 2019 Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.

Instance Segmentation Object +4

Cycle-Consistency for Robust Visual Question Answering

no code implementations CVPR 2019 Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh

Despite significant progress in Visual Question Answering over the years, robustness of today's VQA models leave much to be desired.

Question Answering Question Generation +2

nocaps: novel object captioning at scale

2 code implementations ICCV 2019 Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Grounded Video Description

2 code implementations CVPR 2019 Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

Sentence Video Description

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

9 code implementations26 Jul 2018 Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

Data Augmentation Visual Question Answering (VQA)

Iterative Visual Reasoning Beyond Convolutions

no code implementations CVPR 2018 Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta

The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module.

Visual Reasoning

Spatial Memory for Context Reasoning in Object Detection

36 code implementations ICCV 2017 Xinlei Chen, Abhinav Gupta

On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.

Object Object Detection

PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation21 Feb 2017 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Segmentation +2

An Implementation of Faster RCNN with Study for Region Sampling

48 code implementations7 Feb 2017 Xinlei Chen, Abhinav Gupta

We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection.

General Classification Object Detection

PixelNet: Towards a General Pixel-level Architecture

no code implementations21 Sep 2016 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation +1

Visualizing and Understanding Neural Models in NLP

1 code implementation NAACL 2016 Jiwei Li, Xinlei Chen, Eduard Hovy, Dan Jurafsky

While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret.

Negation Sentence

Mind's Eye: A Recurrent Visual Representation for Image Caption Generation

no code implementations CVPR 2015 Xinlei Chen, C. Lawrence Zitnick

Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.

Caption Generation Image Retrieval +2

Sense Discovery via Co-Clustering on Images and Text

no code implementations CVPR 2015 Xinlei Chen, Alan Ritter, Abhinav Gupta, Tom Mitchell

We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP).

Clustering

Webly Supervised Learning of Convolutional Networks

no code implementations ICCV 2015 Xinlei Chen, Abhinav Gupta

Specifically inspired by curriculum learning, we present a two-step approach for CNN training.

Image Retrieval

Learning a Recurrent Visual Representation for Image Caption Generation

no code implementations20 Nov 2014 Xinlei Chen, C. Lawrence Zitnick

Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.

Caption Generation Image Retrieval +2

Enriching Visual Knowledge Bases via Object Discovery and Segmentation

no code implementations CVPR 2014 Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

In this paper, we propose to enrich these knowledge bases by automatically discovering objects and their segmentations from noisy Internet images.

Object Discovery Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.