Search Results for author: Xiaohua Zhai

Found 45 papers, 35 papers with code

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

no code implementations7 Mar 2024 Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

Interestingly, data and architectural improvements seem to mitigate the negative impact of data balancing on performance; e. g. applying M4 to SigLIP-B/16 with data quality filters improves COCO image-to-text retrieval @5 from 86% (without data balancing) to 87% and ImageNet 0-shot classification from 77% to 77. 5%!

Image-to-Text Retrieval Retrieval +1

SILC: Improving Vision Language Pretraining with Self-Distillation

no code implementations20 Oct 2023 Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc van Gool, Federico Tombari

However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks.

Classification Contrastive Learning +8

Image Captioners Are Scalable Vision Learners Too

1 code implementation NeurIPS 2023 Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer

We further analyze the effect of the model architecture and scale, as well as the pretraining data on the representation quality, and find that captioning exhibits the same or better scaling behavior along these axes.

Image Captioning

Tuning computer vision models with task rewards

1 code implementation16 Feb 2023 André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai

Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.

Colorization Image Captioning +5

Revisiting Neural Scaling Laws in Language and Vision

1 code implementation13 Sep 2022 Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules.

Image Classification Language Modelling +3

Better plain ViT baselines for ImageNet-1k

5 code implementations3 May 2022 Lucas Beyer, Xiaohua Zhai, Alexander Kolesnikov

It is commonly accepted that the Vision Transformer model requires sophisticated regularization techniques to excel at ImageNet-1k scale data.

Data Augmentation Image Classification

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

3 code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +6

LiT: Zero-Shot Transfer with Locked-image text Tuning

4 code implementations CVPR 2022 Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Image Classification Retrieval +2

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

16 code implementations18 Jun 2021 Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.

Data Augmentation Image Classification +5

Scaling Vision Transformers

1 code implementation CVPR 2022 Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer

As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90. 45% top-1 accuracy.

Ranked #3 on Image Classification on VTAB-1k (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

1 code implementation6 Apr 2021 Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB).

Few-Shot Learning General Classification +1

Training general representations for remote sensing using in-domain knowledge

no code implementations30 Sep 2020 Maxim Neumann, André Susano Pinto, Xiaohua Zhai, Neil Houlsby

Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples.

Representation Learning Transfer Learning

Self-Supervised Learning of Video-Induced Visual Invariances

no code implementations CVPR 2020 Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Xiaohua Zhai, Neil Houlsby, Sylvain Gelly, Mario Lucic

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI).

Ranked #15 on Image Classification on VTAB-1k (using extra training data)

Image Classification Self-Supervised Learning +1

In-domain representation learning for remote sensing

1 code implementation15 Nov 2019 Maxim Neumann, Andre Susano Pinto, Xiaohua Zhai, Neil Houlsby

Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community.

 Ranked #1 on Multi-Label Image Classification on BigEarthNet (mAP (macro) metric)

Multi-Label Image Classification Representation Learning +1

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

no code implementations ICLR 2019 Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion.

Self-Supervised GANs via Auxiliary Rotation Loss

4 code implementations CVPR 2019 Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby

In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs.

Image Generation Representation Learning

Self-Supervised GAN to Counter Forgetting

no code implementations27 Oct 2018 Ting Chen, Xiaohua Zhai, Neil Houlsby

To counter forgetting, we encourage the discriminator to maintain useful representations by adding a self-supervision.

Continual Learning General Classification

A Large-Scale Study on Regularization and Normalization in GANs

5 code implementations ICLR 2019 Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion.

Cannot find the paper you are looking for? You can Submit a new open access paper.