no code implementations • 5 Dec 2023 • Spencer Compton, Gregory Valiant
Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean?
no code implementations • 19 Nov 2023 • Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant
From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance.
no code implementations • 6 Jun 2023 • Steven Cao, Percy Liang, Gregory Valiant
We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.
2 code implementations • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant
To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?
no code implementations • 29 Mar 2022 • Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant
We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).
no code implementations • 12 Jan 2022 • Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant
In this work, we place the sample amplification problem on a firm statistical foundation by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions.
no code implementations • 4 Nov 2021 • Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan
We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.
no code implementations • ACL 2021 • Kartik Chandra, Chuma Kabaghe, Gregory Valiant
Our results suggest that polyperceivable examples are surprisingly prevalent in natural language, existing for {\textgreater}2{\%} of English words.
no code implementations • 29 Jun 2021 • Mingda Qiao, Gregory Valiant
We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time.
1 code implementation • 17 Feb 2021 • Kai Sheng Tai, Peter Bailis, Gregory Valiant
Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.
no code implementations • 13 Jan 2021 • Annie Marsden, John Duchi, Gregory Valiant
We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model.
no code implementations • 7 Dec 2020 • Mingda Qiao, Gregory Valiant
In this paper, we prove an $\Omega(T^{0. 528})$ bound on the calibration error, which is the first super-$\sqrt{T}$ lower bound for this setting to the best of our knowledge.
2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré
We validate our proposed scheme on image and text datasets.
no code implementations • 12 Dec 2019 • Weihao Kong, Gregory Valiant, Emma Brunskill
We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.
1 code implementation • NeurIPS 2020 • Justin Y. Chen, Gregory Valiant, Paul Valiant
Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$.
4 code implementations • NeurIPS 2019 • Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou
Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.
no code implementations • 3 Jun 2019 • Melody Y. Guan, Gregory Valiant
Recent work on adversarial examples has demonstrated that most natural inputs can be perturbed to fool even state-of-the-art machine learning systems.
no code implementations • ICML 2020 • Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant
In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.
no code implementations • 19 Apr 2019 • Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant
We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.
no code implementations • 18 Apr 2019 • Vatsal Sharan, Aaron Sidford, Gregory Valiant
We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.
no code implementations • 12 Feb 2019 • Mingda Qiao, Gregory Valiant
The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far.
no code implementations • 12 Feb 2019 • Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade
Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover's distance (between the estimated and true distributions).
3 code implementations • 25 Jan 2019 • Kai Sheng Tai, Peter Bailis, Gregory Valiant
How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network?
no code implementations • NeurIPS 2018 • Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant
This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.
no code implementations • NeurIPS 2018 • Weihao Kong, Gregory Valiant
In this setting, we show that with $O(\sqrt{d})$ samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data.
no code implementations • 22 Nov 2017 • Mingda Qiao, Gregory Valiant
Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source provides a batch of $\ge k$ samples, with the guarantee that at least a $(1-\epsilon)$ fraction of the sources draw their samples from a distribution with total variation distance at most $\eta$ from $p$.
no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.
1 code implementation • 7 Nov 2017 • Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant
We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.
no code implementations • NeurIPS 2017 • Kevin Tian, Weihao Kong, Gregory Valiant
Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0, 1]$, and we observe $n$ independent random variables, $X_1,\ldots, X_n$, with $X_i \sim $ Binomial$(t, p_i)$.
no code implementations • 9 Aug 2017 • Michela Meister, Gregory Valiant
This setting can be viewed as an instance of the semi-verified learning model introduced in [CSV17], which explores the tradeoff between the number of items evaluated by each worker and the fraction of good evaluators.
no code implementations • 25 Jun 2017 • Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant
What learning algorithms can be run directly on compressively-sensed data?
no code implementations • 15 Mar 2017 • Jacob Steinhardt, Moses Charikar, Gregory Valiant
We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.
no code implementations • ICML 2017 • Vatsal Sharan, Gregory Valiant
The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.
no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.
no code implementations • 7 Nov 2016 • Moses Charikar, Jacob Steinhardt, Gregory Valiant
For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?
no code implementations • NeurIPS 2016 • Jacob Steinhardt, Gregory Valiant, Moses Charikar
We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.
no code implementations • 21 Feb 2016 • Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant
When can accurate reconstruction be accomplished in the sparse data regime?
1 code implementation • 30 Jan 2016 • Weihao Kong, Gregory Valiant
We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question.
no code implementations • 21 Apr 2015 • Gregory Valiant, Paul Valiant
One conceptual implication of this result is that for large samples, Bayesian assumptions on the "shape" or bounds on the tail probabilities of a distribution over discrete support are not helpful for the task of learning the distribution.
no code implementations • NeurIPS 2015 • Bhaswar B. Bhattacharya, Gregory Valiant
We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of \emph{unequal} sized samples drawn from each of them.
no code implementations • NeurIPS 2013 • Paul Valiant, Gregory Valiant
Recently, [Valiant and Valiant] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a SUBLINEAR sized sample.
no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant
This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.
no code implementations • 19 Aug 2013 • Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant
We study the question of closeness testing for two discrete distributions.