Search Results for author: Vatsal Sharan

Found 25 papers, 4 papers with code

Simplicity Bias of Transformers to Learn Low Sensitivity Functions

no code implementations11 Mar 2024 Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan

Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive.

Learnability is a Compact Property

no code implementations15 Feb 2024 Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng

Furthermore, the learnability of such problems can fail to be a property of finite character: informally, it cannot be detected by examining finite projections of the problem.

Learning Theory

Stability and Multigroup Fairness in Ranking with Uncertain Predictions

no code implementations14 Feb 2024 Siddartha Devic, Aleksandra Korolova, David Kempe, Vatsal Sharan

However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings.

Fairness

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

no code implementations26 Oct 2023 Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan

In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL.

In-Context Learning

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

1 code implementation9 Oct 2023 Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan

Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative.

Fairness

Regularization and Optimal Multiclass Learning

no code implementations24 Sep 2023 Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng

We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.

Transductive Learning

Fairness in Matching under Uncertainty

no code implementations8 Feb 2023 Siddartha Devic, David Kempe, Vatsal Sharan, Aleksandra Korolova

The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings.

Fairness

Efficient Convex Optimization Requires Superlinear Memory

no code implementations29 Mar 2022 Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).

KL Divergence Estimation with Multi-group Attribution

1 code implementation28 Feb 2022 Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory.

Fairness

On the Statistical Complexity of Sample Amplification

no code implementations12 Jan 2022 Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant

In this work, we place the sample amplification problem on a firm statistical foundation by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions.

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

no code implementations4 Nov 2021 Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.

Omnipredictors

no code implementations11 Sep 2021 Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action.

Fairness

Multicalibrated Partitions for Importance Weights

no code implementations10 Mar 2021 Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder

We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets.

Anomaly Detection Domain Adaptation

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

no code implementations ICML 2020 Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.

valid

Memory-Sample Tradeoffs for Linear Regression with Small Error

no code implementations18 Apr 2019 Vatsal Sharan, Aaron Sidford, Gregory Valiant

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.

regression

A Spectral View of Adversarially Robust Features

no code implementations NeurIPS 2018 Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant

This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.

Efficient Anomaly Detection via Matrix Sketching

no code implementations NeurIPS 2018 Vatsal Sharan, Parikshit Gopalan, Udi Wieder

We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores.

Anomaly Detection

Sketching Linear Classifiers over Data Streams

1 code implementation7 Nov 2017 Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.

feature selection

Learning Overcomplete HMMs

no code implementations NeurIPS 2017 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

no code implementations ICML 2017 Vatsal Sharan, Gregory Valiant

The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.

Tensor Decomposition Word Embeddings

Prediction with a Short Memory

no code implementations8 Dec 2016 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

Cannot find the paper you are looking for? You can Submit a new open access paper.