Search Results for author: Gregory Valiant

Found 43 papers, 8 papers with code

Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances

no code implementations • 5 Dec 2023 • Spencer Compton, Gregory Valiant

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean?

Paper
Add Code

Testing with Non-identically Distributed Samples

no code implementations • 19 Nov 2023 • Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance.

Avg

Paper
Add Code

One-sided Matrix Completion from Two Observations Per Row

no code implementations • 6 Jun 2023 • Steven Cao, Percy Liang, Gregory Valiant

We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.

Matrix Completion

Paper
Add Code

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

2 code implementations • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?

In-Context Learning

154

Paper
Code

Efficient Convex Optimization Requires Superlinear Memory

no code implementations • 29 Mar 2022 • Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).

Paper
Add Code

On the Statistical Complexity of Sample Amplification

no code implementations • 12 Jan 2022 • Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant

In this work, we place the sample amplification problem on a firm statistical foundation by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions.

Paper
Add Code

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

no code implementations • 4 Nov 2021 • Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.

Paper
Add Code

Beyond Laurel/Yanny: An Autoencoder-Enabled Search for Polyperceivable Audio

no code implementations • ACL 2021 • Kartik Chandra, Chuma Kabaghe, Gregory Valiant

Our results suggest that polyperceivable examples are surprisingly prevalent in natural language, existing for {\textgreater}2{\%} of English words.

Paper
Add Code

Exponential Weights Algorithms for Selective Learning

no code implementations • 29 Jun 2021 • Mingda Qiao, Gregory Valiant

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time.

Paper
Add Code

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

1 code implementation • 17 Feb 2021 • Kai Sheng Tai, Peter Bailis, Gregory Valiant

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.

Classification General Classification +1

Paper
Code

On Misspecification in Prediction Problems and Robustness via Improper Learning

no code implementations • 13 Jan 2021 • Annie Marsden, John Duchi, Gregory Valiant

We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model.

Paper
Add Code

Stronger Calibration Lower Bounds via Sidestepping

no code implementations • 7 Dec 2020 • Mingda Qiao, Gregory Valiant

In this paper, we prove an $\Omega(T^{0. 528})$ bound on the calibration error, which is the first super-$\sqrt{T}$ lower bound for this setting to the best of our knowledge.

Paper
Add Code

On the Generalization Effects of Linear Transformations in Data Augmentation

2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

We validate our proposed scheme on image and text datasets.

Data Augmentation text-classification +1

Paper
Code

Sublinear Optimal Policy Value Estimation in Contextual Bandits

no code implementations • 12 Dec 2019 • Weihao Kong, Gregory Valiant, Emma Brunskill

We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.

Multi-Armed Bandits

Paper
Add Code

Worst-Case Analysis for Randomly Collected Data

1 code implementation • NeurIPS 2020 • Justin Y. Chen, Gregory Valiant, Paul Valiant

Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$.

Paper
Code

Making AI Forget You: Data Deletion in Machine Learning

4 code implementations • NeurIPS 2019 • Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.

BIG-bench Machine Learning Clustering

Paper
Code

A Surprising Density of Illusionable Natural Speech

no code implementations • 3 Jun 2019 • Melody Y. Guan, Gregory Valiant

Recent work on adversarial examples has demonstrated that most natural inputs can be perturbed to fool even state-of-the-art machine learning systems.

Paper
Add Code

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

no code implementations • ICML 2020 • Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.

valid

Paper
Add Code

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

no code implementations • 19 Apr 2019 • Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.

Paper
Add Code

Memory-Sample Tradeoffs for Linear Regression with Small Error

no code implementations • 18 Apr 2019 • Vatsal Sharan, Aaron Sidford, Gregory Valiant

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.

regression

Paper
Add Code

A Theory of Selective Prediction

no code implementations • 12 Feb 2019 • Mingda Qiao, Gregory Valiant

The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far.

Open-Ended Question Answering

Paper
Add Code

Maximum Likelihood Estimation for Learning Populations of Parameters

no code implementations • 12 Feb 2019 • Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade

Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover's distance (between the estimated and true distributions).

Paper
Add Code

Equivariant Transformer Networks

3 code implementations • 25 Jan 2019 • Kai Sheng Tai, Peter Bailis, Gregory Valiant

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network?

General Classification Image Classification

Paper
Code

A Spectral View of Adversarially Robust Features

no code implementations • NeurIPS 2018 • Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant

This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.

Paper
Add Code

Estimating Learnability in the Sublinear Data Regime

no code implementations • NeurIPS 2018 • Weihao Kong, Gregory Valiant

In this setting, we show that with $O(\sqrt{d})$ samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data.

Binary Classification

Paper
Add Code

Learning Discrete Distributions from Untrusted Batches

no code implementations • 22 Nov 2017 • Mingda Qiao, Gregory Valiant

Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source provides a batch of $\ge k$ samples, with the guarantee that at least a $(1-\epsilon)$ fraction of the sources draw their samples from a distribution with total variation distance at most $\eta$ from $p$.

Paper
Add Code

Learning Overcomplete HMMs

no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

Paper
Add Code

Sketching Linear Classifiers over Data Streams

1 code implementation • 7 Nov 2017 • Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.

feature selection

Paper
Code

Learning Populations of Parameters

no code implementations • NeurIPS 2017 • Kevin Tian, Weihao Kong, Gregory Valiant

Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0, 1]$, and we observe $n$ independent random variables, $X_1,\ldots, X_n$, with $X_i \sim $ Binomial$(t, p_i)$.

Sports Analytics

Paper
Add Code

A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

no code implementations • 9 Aug 2017 • Michela Meister, Gregory Valiant

This setting can be viewed as an instance of the semi-verified learning model introduced in [CSV17], which explores the tradeoff between the number of items evaluated by each worker and the fraction of good evaluators.

Paper
Add Code

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

no code implementations • 25 Jun 2017 • Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant

What learning algorithms can be run directly on compressively-sensed data?

EEG Time Series +1

Paper
Add Code

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

no code implementations • 15 Mar 2017 • Jacob Steinhardt, Moses Charikar, Gregory Valiant

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.

Paper
Add Code

Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

no code implementations • ICML 2017 • Vatsal Sharan, Gregory Valiant

The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.

Tensor Decomposition Word Embeddings

Paper
Add Code

Prediction with a Short Memory

no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

Paper
Add Code

Learning from Untrusted Data

no code implementations • 7 Nov 2016 • Moses Charikar, Jacob Steinhardt, Gregory Valiant

For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?

Stochastic Optimization

Paper
Add Code

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Gregory Valiant, Moses Charikar

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.

Paper
Add Code

Recovering Structured Probability Matrices

no code implementations • 21 Feb 2016 • Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant

When can accurate reconstruction be accomplished in the sparse data regime?

Collaborative Filtering Community Detection +3

Paper
Add Code

Spectrum Estimation from Samples

1 code implementation • 30 Jan 2016 • Weihao Kong, Gregory Valiant

We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question.

Paper
Code

Instance Optimal Learning

no code implementations • 21 Apr 2015 • Gregory Valiant, Paul Valiant

One conceptual implication of this result is that for large samples, Bayesian assumptions on the "shape" or bounds on the tail probabilities of a distribution over discrete support are not helpful for the task of learning the distribution.

Paper
Add Code

Testing Closeness With Unequal Sized Samples

no code implementations • NeurIPS 2015 • Bhaswar B. Bhattacharya, Gregory Valiant

We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of \emph{unequal} sized samples drawn from each of them.

Paper
Add Code

Estimating the Unseen: Improved Estimators for Entropy and other Properties

no code implementations • NeurIPS 2013 • Paul Valiant, Gregory Valiant

Recently, [Valiant and Valiant] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a SUBLINEAR sized sample.

Paper
Add Code

Least Squares Revisited: Scalable Approaches for Multi-class Prediction

no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.

Paper
Add Code

Optimal Algorithms for Testing Closeness of Discrete Distributions

no code implementations • 19 Aug 2013 • Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant

We study the question of closeness testing for two discrete distributions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.