Search Results for author: Colin Wei

Found 20 papers, 8 papers with code

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

no code implementations16 Jun 2022 Margalit Glasgow, Colin Wei, Mary Wootters, Tengyu Ma

Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails.

Generalization Bounds Memorization

Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations

no code implementations6 Apr 2022 Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma

In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other.

Contrastive Learning Unsupervised Domain Adaptation

Statistically Meaningful Approximation: a Theoretical Analysis for Approximating Turing Machines with Transformers

no code implementations29 Sep 2021 Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

Certified Robustness for Deep Equilibrium Models via Interval Bound Propagation

no code implementations ICLR 2022 Colin Wei, J Zico Kolter

Our key insights are that these interval bounds can be obtained as the fixed-point solution to an IBP-inspired equilibrium equation, and furthermore, that this solution always exists and is unique when the layer obeys a certain parameterization.

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

no code implementations28 Jul 2021 Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

Generalization Bounds

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

1 code implementation NeurIPS 2021 Colin Wei, Sang Michael Xie, Tengyu Ma

The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

Task 2

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

1 code implementation NeurIPS 2021 Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i. e., data augmentations of the same image).

Contrastive Learning Generalization Bounds +1

Meta-learning Transferable Representations with a Single Target Domain

no code implementations3 Nov 2020 Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma

Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks.

Meta-Learning Representation Learning +1

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

no code implementations ICLR 2021 Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks.

Generalization Bounds Unsupervised Domain Adaptation

Self-training Avoids Using Spurious Features Under Domain Shift

no code implementations NeurIPS 2020 Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close.

Unsupervised Domain Adaptation

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

1 code implementation15 Jun 2020 Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

no code implementations ICLR 2020 Colin Wei, Tengyu Ma

For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization.

Generalization Bounds Robust classification

The Implicit and Explicit Regularization Effects of Dropout

1 code implementation ICML 2020 Colin Wei, Sham Kakade, Tengyu Ma

This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent.

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin

1 code implementation9 Oct 2019 Colin Wei, Tengyu Ma

Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth.

General Classification Generalization Bounds +1

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

2 code implementations NeurIPS 2019 Yuanzhi Li, Colin Wei, Tengyu Ma

This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing.

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

7 code implementations NeurIPS 2019 Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes.

Long-tail learning with class descriptors

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

1 code implementation NeurIPS 2019 Colin Wei, Tengyu Ma

For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.

On the Margin Theory of Feedforward Neural Networks

no code implementations ICLR 2019 Colin Wei, Jason Lee, Qiang Liu, Tengyu Ma

We establish: 1) for multi-layer feedforward relu networks, the global minimizer of a weakly-regularized cross-entropy loss has the maximum normalized margin among all networks, 2) as a result, increasing the over-parametrization improves the normalized margin and generalization error bounds for deep networks.

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

no code implementations NeurIPS 2019 Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

Markov Chain Truncation for Doubly-Intractable Inference

no code implementations15 Oct 2016 Colin Wei, Iain Murray

Computing partition functions, the normalizing constants of probability distributions, is often hard.

Cannot find the paper you are looking for? You can Submit a new open access paper.