Search Results for author: Yaodong Yu

Found 33 papers, 19 papers with code

Masked Completion via Structured Diffusion with White-Box Transformers

1 code implementation • 3 Apr 2024 • Druv Pai, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma

We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation.

Representation Learning

1,027

Paper
Code

Differentially Private Representation Learning via Image Captioning

no code implementations • 4 Mar 2024 • Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.

Image Captioning Representation Learning

Paper
Add Code

A Study on the Calibration of In-context Learning

no code implementations • 7 Dec 2023 • HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).

In-Context Learning Natural Language Understanding +1

Paper
Add Code

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

1 code implementation • 22 Nov 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma

This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.

Data Compression Denoising +1

1,027

Paper
Code

Emergence of Segmentation with Minimalistic White-Box Transformers

1 code implementation • 30 Aug 2023 • Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma

Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.

Segmentation Self-Supervised Learning

1,027

Paper
Code

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

no code implementations • 25 Jul 2023 • Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan

We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning.

Fairness Federated Learning

Paper
Add Code

ViP: A Differentially Private Foundation Model for Computer Vision

1 code implementation • 15 Jun 2023 • Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee.

Paper
Code

White-Box Transformers via Sparse Rate Reduction

1 code implementation • NeurIPS 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma

Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.

Representation Learning

1,027

Paper
Code

Federated Conformal Predictors for Distributed Uncertainty Quantification

1 code implementation • 27 May 2023 • Charles Lu, Yaodong Yu, Sai Praneeth Karimireddy, Michael I. Jordan, Ramesh Raskar

Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models.

Conformal Prediction Federated Learning +1

Paper
Code

TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

1 code implementation • 13 Jul 2022 • Yaodong Yu, Alexander Wei, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan

Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e. g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation.

Federated Learning

Paper
Code

Robust Calibration with Multi-domain Temperature Scaling

no code implementations • 6 Jun 2022 • Yaodong Yu, Stephen Bates, Yi Ma, Michael I. Jordan

Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains.

Uncertainty Quantification

Paper
Add Code

Conditional Supervised Contrastive Learning for Fair Text Classification

1 code implementation • 23 May 2022 • Jianfeng Chi, William Shand, Yaodong Yu, Kai-Wei Chang, Han Zhao, Yuan Tian

Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data.

Contrastive Learning Fairness +3

Paper
Code

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

no code implementations • 15 May 2022 • Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings.

Bayesian Optimization

Paper
Add Code

What You See is What You Get: Principled Deep Learning via Distributional Generalization

1 code implementation • 7 Apr 2022 • Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran

In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.

Paper
Code

Predicting Out-of-Distribution Error with the Projection Norm

1 code implementation • 11 Feb 2022 • Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt

Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels.

Pseudo Label text-classification +1

Paper
Code

The Effect of Model Size on Worst-Group Generalization

no code implementations • 8 Dec 2021 • Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known.

Paper
Add Code

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

1 code implementation • 12 Nov 2021 • Xili Dai, Shengbang Tong, Mingyang Li, Ziyang Wu, Michael Psenka, Kwan Ho Ryan Chan, Pengyuan Zhai, Yaodong Yu, Xiaojun Yuan, Heung Yeung Shum, Yi Ma

In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces.

Paper
Code

An Empirical Study of Pre-trained Models on Out-of-distribution Generalization

no code implementations • 29 Sep 2021 • Yaodong Yu, Heinrich Jiang, Dara Bahri, Hossein Mobahi, Seungyeon Kim, Ankit Singh Rawat, Andreas Veit, Yi Ma

Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.

Out-of-Distribution Generalization

Paper
Add Code

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

no code implementations • 30 Jun 2021 • Chris Junchi Li, Yaodong Yu, Nicolas Loizou, Gauthier Gidel, Yi Ma, Nicolas Le Roux, Michael I. Jordan

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence.

Paper
Add Code

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

2 code implementations • 21 May 2021 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma

This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.

Data Compression

529

Paper
Code

Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization

no code implementations • 27 Apr 2021 • Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan

Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity.

BIG-bench Machine Learning Selection bias

Paper
Add Code

Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

1 code implementation • 17 Mar 2021 • Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma

To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).

Paper
Code

Deep Networks from the Principle of Rate Reduction

3 code implementations • 27 Oct 2020 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma

The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme.

529

Paper
Code

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

1 code implementation • 28 Sep 2020 • Yifei Huang, Yaodong Yu, Hongyang Zhang, Yi Ma, Yuan YAO

Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e. g., under PGD-20 ($\ell_\infty=0. 031$) attack on CIFAR-10 dataset, it achieves 91. 57\% and natural accuracy and 62. 35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76. 29\% and 45. 24\%, respectively.

Adversarial Defense Adversarial Robustness

Paper
Code

Boundary thickness and robustness in learning models

1 code implementation • NeurIPS 2020 • Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.

Adversarial Defense Data Augmentation

Paper
Code

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

2 code implementations • NeurIPS 2020 • Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, Yi Ma

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.

Ranked #15 on Image Clustering on STL-10

Clustering Contrastive Learning +1

189

Paper
Code

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

1 code implementation • ICML 2020 • Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma

We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.

Paper
Code

Theoretically Principled Trade-off between Robustness and Accuracy

8 code implementations • 24 Jan 2019 • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Ranked #3 on Adversarial Attack on CIFAR-10

Adversarial Attack Adversarial Defense +2

503

Paper
Code

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations • NeurIPS 2018 • Yaodong Yu, Pan Xu, Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Stochastic Optimization

Paper
Add Code

A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery

no code implementations • ICML 2018 • Xiao Zhang, Lingxiao Wang, Yaodong Yu, Quanquan Gu

We propose a primal-dual based framework for analyzing the global optimality of nonconvex low-rank matrix recovery.

Matrix Completion

Paper
Add Code

Learning One-hidden-layer ReLU Networks via Gradient Descent

no code implementations • 20 Jun 2018 • Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network.

Paper
Add Code

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations • 18 Dec 2017 • Yaodong Yu, Pan Xu, Quanquan Gu

Stochastic Optimization

Paper
Add Code

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently

no code implementations • 11 Dec 2017 • Yaodong Yu, Difan Zou, Quanquan Gu

We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.