Search Results for author: Saining Xie

Found 41 papers, 34 papers with code

V-IRL: Grounding Virtual Intelligence in Real Life

1 code implementation5 Feb 2024 Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie

There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created.

Decision Making

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

1 code implementation25 Jan 2024 Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He

In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation.

Denoising Image Generation +3

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

1 code implementation16 Jan 2024 Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, Saining Xie

We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT).

Image Generation

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

1 code implementation11 Jan 2024 Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann Lecun, Saining Xie

To understand the roots of these errors, we explore the gap between the visual embedding space of CLIP and vision-only self-supervised learning.

Representation Learning Self-Supervised Learning +1

Image Sculpting: Precise Object Editing with 3D Geometry Control

no code implementations2 Jan 2024 Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie

We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.

Object

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs

1 code implementation21 Dec 2023 Penghao Wu, Saining Xie

However, the lack of this visual search mechanism in current multimodal LLMs (MLLMs) hinders their ability to focus on important visual details, especially when handling high-resolution and visually crowded images.

Visual Question Answering World Knowledge

Demystifying CLIP Data

2 code implementations28 Sep 2023 Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.

Going Denser with Open-Vocabulary Part Segmentation

2 code implementations ICCV 2023 Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan

In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.

Object object-detection +3

CiT: Curation in Training for Effective Vision-Language Data

1 code implementation ICCV 2023 Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford.

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

10 code implementations CVPR 2023 Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.

Object Detection Representation Learning +2

Exploring Long-Sequence Masked Autoencoders

1 code implementation13 Oct 2022 Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.

Object Detection Segmentation +1

A ConvNet for the 2020s

45 code implementations CVPR 2022 Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

Classification Domain Generalization +3

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

no code implementations27 Dec 2021 Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz

In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models.

Classification Image Captioning +3

SLIP: Self-supervision meets Language-Image Pre-training

1 code implementation23 Dec 2021 Norman Mu, Alexander Kirillov, David Wagner, Saining Xie

Across ImageNet and a battery of additional datasets, we find that SLIP improves accuracy by a large margin.

Multi-Task Learning Representation Learning +1

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations22 Nov 2021 Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

Pri3D: Can 3D Priors Help 2D Representation Learning?

1 code implementation ICCV 2021 Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner

Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints.

Contrastive Learning Instance Segmentation +5

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness

1 code implementation NeurIPS 2021 Eric Mintun, Alexander Kirillov, Saining Xie

Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.

Is Robustness Robust? On the interaction between augmentations and corruptions

no code implementations1 Jan 2021 Eric Mintun, Alexander Kirillov, Saining Xie

Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

2 code implementations CVPR 2021 Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e. g. point clouds) are notoriously hard.

3D Semantic Segmentation Instance Segmentation +2

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

2 code implementations ECCV 2020 Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany

To this end, we select a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes.

Point Cloud Pre-training Representation Learning +3

Graph Structure of Neural Networks

3 code implementations ICML 2020 Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie

Neural networks are often represented as graphs of connections between neurons.

Clustering

Are Labels Necessary for Neural Architecture Search?

2 code implementations ECCV 2020 Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search

no code implementations25 Sep 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

As a result, using manually designed action space to perform NAS often leads to sample-inefficient explorations of architectures and thus can be sub-optimal.

Bayesian Optimization Neural Architecture Search

Sample-Efficient Neural Architecture Search by Learning Action Space

1 code implementation17 Jun 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.

Evolutionary Algorithms Neural Architecture Search

On Network Design Spaces for Visual Recognition

4 code implementations ICCV 2019 Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.

Neural Architecture Search

Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search

1 code implementation1 Jan 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.

Evolutionary Algorithms Image Classification +1

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

1 code implementation ECCV 2018 Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.

Ranked #27 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Detection +6

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

no code implementations23 Nov 2015 Saining Xie, Xun Huang, Zhuowen Tu

Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.

Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification

no code implementations CVPR 2015 Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin

We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.

Fine-Grained Image Classification General Classification +3

Holistically-Nested Edge Detection

17 code implementations ICCV 2015 Saining Xie, Zhuowen Tu

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.

Boundary Detection Edge Detection

Deeply-Supervised Nets

1 code implementation18 Sep 2014 Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.