no code implementations • NAACL (SocialNLP) 2021 • Ivy Cao, Zizhou Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano
As of now, however, it is not clear how and to what extent the pandemic has affected restaurant reviews, an analysis of which could potentially inform policies for addressing this ongoing situation.
1 code implementation • 14 Feb 2024 • Clayton Sanford, Daniel Hsu, Matus Telgarsky
We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation.
no code implementations • 1 Feb 2024 • Samuel Deng, Daniel Hsu
The multi-group learning model formalizes the learning scenario in which a single predictor must generalize well on multiple, possibly overlapping subgroups of interest.
no code implementations • 27 Jan 2024 • Daniel Hsu, Jizhou Huang, Brendan Juba
In this work, we give positive and negative results on auditing for Gaussian distributions: On the positive side, we present an alternative approach to leverage these advances in agnostic learning and thereby obtain the first polynomial-time approximation scheme (PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to audit statistical notions of fairness over homogeneous halfspace subgroups when the features are Gaussian.
no code implementations • 24 Dec 2023 • Gan Yuan, Mingyue Xu, Samory Kpotufe, Daniel Hsu
We consider the problem of sufficient dimension reduction (SDR) for multi-index models.
no code implementations • 9 Jul 2023 • Daniel Hsu, Arya Mazumdar
The logistic regression model is one of the most popular data generation model in noisy binary classification problems.
no code implementations • 7 Mar 2023 • Samuel Deng, Navid Ardeshir, Daniel Hsu
We consider the problem of distribution-free conformal prediction and the criterion of group conditional validity.
1 code implementation • 10 Jun 2022 • Navid Ardeshir, Daniel Hsu, Clayton Sanford
We study the structural and statistical properties of $\mathcal{R}$-norm minimizing interpolants of datasets labeled by specific target functions.
no code implementations • 15 Apr 2022 • Rishabh Dudeja, Daniel Hsu
Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter.
no code implementations • 18 Feb 2022 • Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski
This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour.
no code implementations • 10 Feb 2022 • Daniel Hsu, Clayton Sanford, Rocco Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis
This lower bound is essentially best possible since an SQ algorithm of Klivans et al. (2008) agnostically learns this class to any constant excess error using $n^{O(\log k)}$ queries of tolerance $n^{-O(\log k)}$.
no code implementations • 18 Jan 2022 • Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal
Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information.
no code implementations • 22 Dec 2021 • Christopher Tosh, Daniel Hsu
Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population.
no code implementations • NeurIPS 2021 • Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire
We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.
1 code implementation • NeurIPS 2021 • Navid Ardeshir, Clayton Sanford, Daniel Hsu
The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector.
no code implementations • ICLR 2021 • Daniel Hsu, Ziwei Ji, Matus Telgarsky, Lan Wang
This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds.
no code implementations • 3 Feb 2021 • Daniel Hsu, Clayton Sanford, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions?
no code implementations • 4 Jan 2021 • Daniel Hsu
Discovering approximately optimal policies in domains is crucial to applying reinforcement learning (RL) in many real-world scenarios, which is termed as policy optimization.
no code implementations • 4 Dec 2020 • Bo Cowgill, Fabrizio Dell'Acqua, Samuel Deng, Daniel Hsu, Nakul Verma, Augustin Chaintreau
We find that biased predictions are mostly caused by biased training data.
no code implementations • EMNLP (Louhi) 2020 • Ziyi Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano
To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Giannis Karamanolakis, Daniel Hsu, Luis Gravano
In this work, we propose a cross-lingual teacher-student method, CLTS, that generates "weak" supervision in the target language using minimal cross-lingual resources, in the form of a small number of word translations.
no code implementations • 22 Sep 2020 • Daniel Hsu, Vidya Muthukumar, Ji Xu
The support vector machine (SVM) is a well-established classification method whose name refers to the particular training examples, called support vectors, that determine the maximum margin separating hyperplane.
no code implementations • 24 Aug 2020 • Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu
Self-supervised learning is an empirically successful approach to unsupervised learning based on creating artificial supervised learning problems.
no code implementations • 10 Aug 2020 • Rishabh Dudeja, Daniel Hsu
Our analysis reveals that the optimal sample complexity in the SQ model depends on whether $\mathbb{E} \mathbf{T}_1$ is symmetric or not.
no code implementations • 13 Jul 2020 • José Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltán Haiman
Deep Neural Networks (DNNs) are powerful algorithms that have been proven capable of extracting non-Gaussian information from weak lensing (WL) data sets.
Cosmology and Nongalactic Astrophysics
2 code implementations • NeurIPS 2020 • Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu
In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.
no code implementations • 16 May 2020 • Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin, Daniel Hsu, Anant Sahai
We compare classification and regression tasks in an overparameterized linear model with Gaussian features.
no code implementations • 4 Mar 2020 • Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu
Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data.
no code implementations • 30 Dec 2019 • Daniel Hsu
The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function.
no code implementations • WS 2019 • Giannis Karamanolakis, Daniel Hsu, Luis Gravano
In many review classification applications, a fine-grained analysis of the reviews is desirable, because different segments (e. g., sentences) of a review may focus on different aspects of the entity in question.
no code implementations • 4 Sep 2019 • Mathias Lecuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu
Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores.
1 code implementation • IJCNLP 2019 • Giannis Karamanolakis, Daniel Hsu, Luis Gravano
In this work, we consider weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words (i. e., weakly positive indicators) for the aspects of interest.
no code implementations • 8 Jul 2019 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu
We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by $1+\epsilon$ times the loss of the optimum.
no code implementations • 8 Jun 2019 • Yu-cheng Chen, Matus Telgarsky, Chao Zhang, Bolton Bailey, Daniel Hsu, Jian Peng
This paper provides a simple procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs).
no code implementations • 7 Jun 2019 • Kevin Shi, Daniel Hsu, Allison Bishop
We propose a new randomized ensemble technique with a provable security guarantee against black-box transfer attacks.
no code implementations • 5 Jun 2019 • Christopher Tosh, Daniel Hsu
We introduce interactive structure discovery, a generic framework that encompasses many interactive learning settings, including active learning, top-k item identification, interactive drug discovery, and others.
no code implementations • NeurIPS 2019 • Ji Xu, Daniel Hsu
We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance.
no code implementations • 18 Mar 2019 • Mikhail Belkin, Daniel Hsu, Ji Xu
The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models.
no code implementations • ICLR Workshop LLD 2019 • Giannis Karamanolakis, Daniel Hsu, Luis Gravano
In this work, we propose a weakly supervised approach for training neural networks for aspect extraction in cases where only a small set of seed words, i. e., keywords that describe an aspect, are available.
1 code implementation • 10 Feb 2019 • Dezső Ribli, Bálint Ármin Pataki, José Manuel Zorrilla Matilla, Daniel Hsu, Zoltán Haiman, István Csabai
Previous studies attempted to extract non-Gaussian information from weak lensing observations through several higher-order statistics such as the three-point correlation function, peak counts or Minkowski-functionals.
Cosmology and Nongalactic Astrophysics
no code implementations • 5 Feb 2019 • Ji Xu, Arian Maleki, Kamiar Rahnama Rad, Daniel Hsu
This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n, p \rightarrow \infty$ and $n/p \rightarrow \delta>1$ ($\delta$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i. e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques.
3 code implementations • 28 Dec 2018 • Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.
no code implementations • NeurIPS 2018 • Ji Xu, Daniel Hsu, Arian Maleki
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective.
no code implementations • 4 Oct 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu
Without any assumptions on the noise, the linear least squares solution for any i. i. d.
no code implementations • NeurIPS 2018 • Mikhail Belkin, Daniel Hsu, Partha Mitra
Finally, this paper suggests a way to explain the phenomenon of adversarial examples, which are seemingly ubiquitous in modern machine learning, and also discusses some connections to kernel machines and random forests in the interpolated regime.
no code implementations • NeurIPS 2018 • Michał Dereziński, Manfred K. Warmuth, Daniel Hsu
We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.
6 code implementations • 9 Feb 2018 • Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana
Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth.
no code implementations • 4 Feb 2018 • Arushi Gupta, José Manuel Zorrilla Matilla, Daniel Hsu, Zoltán Haiman
Weak lensing maps contain information beyond two-point statistics on small scales.
no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári
The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.
no code implementations • 9 Aug 2017 • Daniel Hsu
In this paper, we use variational recurrent neural network to investigate the anomaly detection problem on graph time series.
no code implementations • 23 Jul 2017 • Daniel Hsu
Time series account for a large proportion of the data stored in financial, medical and scientific databases.
no code implementations • 3 Jul 2017 • Daniel Hsu
Previous methods based on stacked recurrent neural network (RNN) and deep belief network (DBN) models cannot model the tendencies in multiple periods, and no models for sequential data pay special attention to redundant input variables which have no or even negative impact on prediction and modeling.
no code implementations • 2 Jun 2017 • Arushi Gupta, Daniel Hsu
The underlying parameters of the model were previously shown to be identifiable from the choice probabilities for the all-products assortment, together with choice probabilities for assortments of all-but-one products.
no code implementations • NeurIPS 2017 • Daniel Hsu, Kevin Shi, Xiaorui Sun
Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i. i. d.
no code implementations • 13 Jan 2017 • Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha
First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.
no code implementations • NeurIPS 2016 • Ji Xu, Daniel Hsu, Arian Maleki
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models.
no code implementations • 21 Jul 2016 • Daniel Hsu, Matus Telgarsky
This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost.
no code implementations • NeurIPS 2016 • Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang
We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not).
1 code implementation • TACL 2016 • Karl Stratos, Michael Collins, Daniel Hsu
We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem.
no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, Csaba Szepesvári
The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path.
no code implementations • 2 Oct 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • NeurIPS 2014 • Kamalika Chaudhuri, Daniel Hsu, Shuang Song
A basic problem in the design of privacy-preserving algorithms is the private maximization problem: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy.
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • NeurIPS 2013 • Animashree Anandkumar, Daniel Hsu, Majid Janzamin, Sham Kakade
This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words.
no code implementations • 7 Jul 2013 • Daniel Hsu, Sivan Sabato
This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments.
no code implementations • 12 Feb 2013 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade
We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.
no code implementations • 13 Dec 2012 • Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang
We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.
no code implementations • 29 Oct 2012 • Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).
no code implementations • 24 Sep 2012 • Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham M. Kakade
The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks.
1 code implementation • 3 Mar 2012 • Animashree Anandkumar, Daniel Hsu, Sham M. Kakade
Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations.
no code implementations • 13 Jun 2011 • Daniel Hsu, Sham M. Kakade, Tong Zhang
The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting.
no code implementations • 26 Nov 2008 • Daniel Hsu, Sham M. Kakade, Tong Zhang
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series.