Search Results for author: Jason D. Lee

Found 131 papers, 25 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

no code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Paper
Add Code

Dataset Reset Policy Optimization for RLHF

2 code implementations • 12 Apr 2024 • Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

156

Paper
Code

Horizon-Free Regret for Linear Markov Decision Processes

no code implementations • 15 Mar 2024 • Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du

A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a. k. a.~the horizon-free bounds.

LEMMA Reinforcement Learning (RL)

Paper
Add Code

Computational-Statistical Gaps in Gaussian Single-Index Models

no code implementations • 8 Mar 2024 • Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation.

Paper
Add Code

How Well Can Transformers Emulate In-context Newton's Method?

no code implementations • 5 Mar 2024 • Angeliki Giannou, Liu Yang, Tianhao Wang, Dimitris Papailiopoulos, Jason D. Lee

Recent studies have suggested that Transformers can implement first-order optimization algorithms for in-context learning and even second order ones for the case of linear regression.

In-Context Learning regression

Paper
Add Code

How Transformers Learn Causal Structure with Gradient Descent

no code implementations • 22 Feb 2024 • Eshaan Nichani, Alex Damian, Jason D. Lee

The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens.

In-Context Learning

Paper
Add Code

LoRA Training in the NTK Regime has No Spurious Local Minima

1 code implementation • 19 Feb 2024 • Uijeong Jang, Jason D. Lee, Ernest K. Ryu

Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited.

Paper
Code

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

1 code implementation • 18 Feb 2024 • Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard.

Benchmarking

Paper
Code

BitDelta: Your Fine-Tune May Only Be Worth One Bit

1 code implementation • 15 Feb 2024 • James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.

159

Paper
Code

An Information-Theoretic Analysis of In-Context Learning

no code implementations • 28 Jan 2024 • Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted.

In-Context Learning Meta-Learning

Paper
Add Code

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation • 19 Jan 2024 • Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

1,860

Paper
Code

Towards Optimal Statistical Watermarking

no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

Paper
Add Code

Optimal Multi-Distribution Learning

no code implementations • 8 Dec 2023 • Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee

Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension $d$, we propose a novel algorithm that yields an $varepsilon$-optimal randomized hypothesis with a sample complexity on the order of $(d+k)/\varepsilon^2$ (modulo some logarithmic factor), matching the best-known lower bound.

Fairness

Paper
Add Code

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

1 code implementation • 30 Nov 2023 • Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu

Recent work by Power et al. (2022) highlighted a surprising "grokking" phenomenon in learning arithmetic tasks: a neural net first "memorizes" the training set, resulting in perfect training accuracy but near-random test accuracy, and after training for sufficiently longer, it suddenly transitions to perfect test accuracy.

Paper
Code

Learning Hierarchical Polynomials with Three-Layer Neural Networks

no code implementations • 23 Nov 2023 • ZiHao Wang, Eshaan Nichani, Jason D. Lee

Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}}(d^k)$ samples and polynomial time.

Paper
Add Code

Provably Efficient CVaR RL in Low-rank MDPs

no code implementations • 20 Nov 2023 • Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee

We study CVaR RL in low-rank MDPs with nonlinear function approximation.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

REST: Retrieval-Based Speculative Decoding

1 code implementation • 14 Nov 2023 • Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

133

Paper
Code

Settling the Sample Complexity of Online Reinforcement Learning

no code implementations • 25 Jul 2023 • Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du

While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Teaching Arithmetic to Small Transformers

1 code implementation • 7 Jul 2023 • Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed.

Low-Rank Matrix Completion

Paper
Code

Scaling In-Context Demonstrations with Structured Attention

no code implementations • 5 Jul 2023 • Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.

In-Context Learning Sentence

Paper
Add Code

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

no code implementations • NeurIPS 2023 • Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity.

valid

Paper
Add Code

Solving Robust MDPs through No-Regret Dynamics

no code implementations • 30 May 2023 • Etash Kumar Guha, Jason D. Lee

Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics.

Navigate Policy Gradient Methods

Paper
Add Code

Provable Reward-Agnostic Preference-Based Reinforcement Learning

no code implementations • 29 May 2023 • Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals.

reinforcement-learning

Paper
Add Code

Reward Collapse in Aligning Large Language Models

1 code implementation • 28 May 2023 • Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime.

Paper
Code

Fine-Tuning Language Models with Just Forward Passes

2 code implementations • NeurIPS 2023 • Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

977

Paper
Code

Provable Offline Preference-Based Reinforcement Learning

no code implementations • 24 May 2023 • Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.

reinforcement-learning

Paper
Add Code

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

1 code implementation • 8 May 2023 • Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.

LEMMA Multi-agent Reinforcement Learning +1

Paper
Code

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

no code implementations • 3 Mar 2023 • Zhuoqing Song, Jason D. Lee, Zhuoran Yang

Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game.

Paper
Add Code

Provably Efficient Reinforcement Learning via Surprise Bound

no code implementations • 22 Feb 2023 • Hanlin Zhu, Ruosong Wang, Jason D. Lee

Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Efficient displacement convex optimization with particle gradient descent

no code implementations • 9 Feb 2023 • Hadi Daneshmand, Jason D. Lee, Chi Jin

Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures.

Paper
Add Code

Looped Transformers as Programmable Computers

1 code implementation • 30 Jan 2023 • Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.

In-Context Learning

Paper
Code

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

no code implementations • 27 Jan 2023 • Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models.

Incremental Learning

Paper
Add Code

Reconstructing Training Data from Model Gradient, Provably

no code implementations • 7 Dec 2022 • Zihan Wang, Jason D. Lee, Qi Lei

Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy.

Federated Learning Tensor Decomposition

Paper
Add Code

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

no code implementations • 13 Oct 2022 • Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart).

Retrieval

Paper
Add Code

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

1 code implementation • 30 Sep 2022 • Alex Damian, Eshaan Nichani, Jason D. Lee

Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions.

Paper
Code

PAC Reinforcement Learning for Predictive State Representations

no code implementations • 12 Jul 2022 • Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Neural Networks can Learn Representations with Gradient Descent

no code implementations • 30 Jun 2022 • Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi

Furthermore, in a transfer learning setup where the data distributions in the source and target domain share the same representation $U$ but have different polynomial heads we show that a popular heuristic for transfer learning has a target sample complexity independent of $d$.

Transfer Learning

Paper
Add Code

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We study Reinforcement Learning for partially observable dynamical systems using function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.

Paper
Add Code

Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

1 code implementation • 8 Jun 2022 • Eshaan Nichani, Yu Bai, Jason D. Lee

Next, we show that a wide two-layer neural network can jointly use the NTK and QuadNTK to fit target functions consisting of a dense low-degree term and a sparse high-degree term -- something neither the NTK nor the QuadNTK can do on their own.

Paper
Code

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

no code implementations • 3 Jun 2022 • Wenhao Zhan, Jason D. Lee, Zhuoran Yang

We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.

Decision Making

Paper
Add Code

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

no code implementations • 18 May 2022 • Itay Safran, Gal Vardi, Jason D. Lee

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting.

Binary Classification

Paper
Add Code

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

no code implementations • 29 Mar 2022 • Jiaqi Yang, Qi Lei, Jason D. Lee, Simon S. Du

We give novel algorithms for multi-task and lifelong linear bandits with shared representation.

Paper
Add Code

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations • 9 Feb 2022 • Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +1

Paper
Add Code

Optimization-Based Separations for Neural Networks

no code implementations • 4 Dec 2021 • Itay Safran, Jason D. Lee

Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities.

Paper
Add Code

Provable Hierarchy-Based Meta-Reinforcement Learning

no code implementations • 18 Oct 2021 • Kurtland Chua, Qi Lei, Jason D. Lee

To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task.

Hierarchical Reinforcement Learning Learning Theory +4

Paper
Add Code

Provable Regret Bounds for Deep Online Learning and Control

no code implementations • 15 Oct 2021 • Xinyi Chen, Edgar Minasyan, Jason D. Lee, Elad Hazan

The theory of deep learning focuses almost exclusively on supervised learning, non-convex optimization using stochastic gradient descent, and overparametrized neural networks.

Second-order methods

Paper
Add Code

MURO: Deployment Constrained Reinforcement Learning with Model-based Uncertainty Regularized Batch Optimization

no code implementations • 29 Sep 2021 • DiJia Su, Jason D. Lee, John Mulvey, H. Vincent Poor

In the high support region (low uncertainty), we encourage our policy by taking an aggressive update.

Recommendation Systems reinforcement-learning +2

Paper
Add Code

Towards General Function Approximation in Zero-Sum Markov Games

no code implementations • ICLR 2022 • Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang

In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.

Paper
Add Code

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions.

Reinforcement Learning (RL)

Paper
Add Code

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

no code implementations • NeurIPS 2021 • Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem.

Paper
Add Code

A Short Note on the Relationship of Information Gain and Eluder Dimension

no code implementations • 6 Jul 2021 • Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei

Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning.

LEMMA reinforcement-learning +1

Paper
Add Code

Near-Optimal Linear Regression under Distribution Shift

no code implementations • 23 Jun 2021 • Qi Lei, Wei Hu, Jason D. Lee

Transfer learning is essential when sufficient data comes from the source domain, with scarce labeled data from the target domain.

regression Transfer Learning

Paper
Add Code

Label Noise SGD Provably Prefers Flat Global Minimizers

no code implementations • NeurIPS 2021 • Alex Damian, Tengyu Ma, Jason D. Lee

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.

Paper
Add Code

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

no code implementations • 24 May 2021 • Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi

These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.

Reinforcement Learning (RL)

Paper
Add Code

How Fine-Tuning Allows for Effective Meta-Learning

no code implementations • NeurIPS 2021 • Kurtland Chua, Qi Lei, Jason D. Lee

Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations.

Few-Shot Learning Representation Learning

Paper
Add Code

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Paper
Add Code

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

no code implementations • 23 Feb 2021 • DiJia Su, Jason D. Lee, John M. Mulvey, H. Vincent Poor

We consider a setting that lies between pure offline reinforcement learning (RL) and pure online RL called deployment constrained RL in which the number of policy deployments for data sampling is limited.

Reinforcement Learning (RL) Uncertainty Quantification

Paper
Add Code

A Theory of Label Propagation for Subpopulation Shift

no code implementations • 22 Feb 2021 • Tianle Cai, Ruiqi Gao, Jason D. Lee, Qi Lei

In this work, we propose a provably effective framework for domain adaptation based on label propagation.

Domain Adaptation Generalization Bounds

Paper
Add Code

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

no code implementations • 17 Feb 2021 • Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces.

Policy Gradient Methods Vocal Bursts Valence Prediction

Paper
Add Code

How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

1 code implementation • NeurIPS 2020 • Yihong Gu, Weizhong Zhang, Cong Fang, Jason D. Lee, Tong Zhang

With the help of a new technique called {\it neural network grafting}, we demonstrate that even during the entire training process, feature distributions of differently initialized networks remain similar at each layer.

Paper
Code

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

no code implementations • NeurIPS 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.

Q-Learning

Paper
Add Code

Beyond Lazy Training for Over-parameterized Tensor Decomposition

no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

Tensor Decomposition

Paper
Add Code

Impact of Representation Learning in Linear Bandits

no code implementations • ICLR 2021 • Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du

For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit.

Representation Learning

Paper
Add Code

How Important is the Train-Validation Split in Meta-Learning?

no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Paper
Add Code

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

1 code implementation • NeurIPS 2020 • Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, Ruiqi Gao, Li-Wei Wang, Jason D. Lee

In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance.

Network Pruning

Paper
Code

Generalized Leverage Score Sampling for Neural Networks

no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory regression

Paper
Add Code

Predicting What You Already Know Helps: Provable Self-Supervised Learning

no code implementations • NeurIPS 2021 • Jason D. Lee, Qi Lei, Nikunj Saunshi, Jiacheng Zhuo

Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data to learn useful semantic representations.

Representation Learning Self-Supervised Learning

Paper
Add Code

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

no code implementations • NeurIPS 2020 • Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

General Classification

Paper
Add Code

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

no code implementations • 3 Jul 2020 • Cong Fang, Jason D. Lee, Pengkun Yang, Tong Zhang

This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, and further leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re-parameterization.

Paper
Add Code

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.

Paper
Add Code

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

no code implementations • NeurIPS 2020 • Kaiyi Ji, Jason D. Lee, Yingbin Liang, H. Vincent Poor

Although model-agnostic meta-learning (MAML) is a very successful algorithm in meta-learning practice, it can have high computational cost because it updates all model parameters over both the inner loop of task-specific adaptation and the outer-loop of meta initialization training.

Meta-Learning

Paper
Add Code

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

1 code implementation • 15 Jun 2020 • Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.

Paper
Code

Distributed Estimation for Principal Component Analysis: an Enlarged Eigenspace Analysis

no code implementations • 5 Apr 2020 • Xi Chen, Jason D. Lee, He Li, Yun Yang

To abandon this eigengap assumption, we consider a new route in our analysis: instead of exactly identifying the top-$L$-dim eigenspace, we show that our estimator is able to cover the targeted top-$L$-dim population eigenspace.

Paper
Add Code

Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting

no code implementations • 23 Mar 2020 • Lemeng Wu, Mao Ye, Qi Lei, Jason D. Lee, Qiang Liu

Recently, Liu et al.[19] proposed a splitting steepest descent (S2D) method that jointly optimizes the neural parameters and architectures based on progressively growing network structures by splitting neurons into multiple copies in a steepest descent fashion.

Paper
Add Code

Few-Shot Learning via Learning the Representation, Provably

no code implementations • ICLR 2021 • Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei

First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} + \frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class, $\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the dimension of the representation.

Few-Shot Learning Representation Learning

Paper
Add Code

Kernel and Rich Regimes in Overparametrized Models

1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

Paper
Code

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

no code implementations • 17 Feb 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-Learning

Paper
Add Code

Neural Temporal-Difference Learning Converges to Global Optima

no code implementations • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima?

no code implementations • 22 Nov 2019 • Maziar Sanjabi, Sina Baharlouei, Meisam Razaviyayn, Jason D. Lee

We study the optimization problem for decomposing $d$ dimensional fourth-order Tensors with $k$ non-orthogonal components.

Tensor Decomposition

Paper
Add Code

SGD Learns One-Layer Networks in WGANs

no code implementations • ICML 2020 • Qi Lei, Jason D. Lee, Alexandros G. Dimakis, Constantinos Daskalakis

Generative adversarial networks (GANs) are a widely used framework for learning generative models.

Paper
Add Code

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

no code implementations • ICLR 2020 • Yu Bai, Jason D. Lee

Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by he Neural Tangent Kernels (NTKs).

Paper
Add Code

Optimal transport mapping via input convex neural networks

2 code implementations • ICML 2020 • Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee

Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping.

Paper
Code

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

no code implementations • 1 Aug 2019 • Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.

Policy Gradient Methods

Paper
Add Code

Convergence of Adversarial Training in Overparametrized Neural Networks

no code implementations • NeurIPS 2019 • Ruiqi Gao, Tianle Cai, Haochuan Li, Li-Wei Wang, Cho-Jui Hsieh, Jason D. Lee

Neural networks are vulnerable to adversarial examples, i. e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network.

Paper
Add Code

Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima

1 code implementation • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.

Q-Learning

Paper
Code

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

Paper
Add Code

Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods

1 code implementation • NeurIPS 2019 • Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D. Lee, Meisam Razaviyayn

In this paper, we study the problem in the non-convex regime and show that an \varepsilon--first order stationary point of the game can be computed when one of the player's objective can be optimized to global optimality efficiently.

Paper
Code

Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition

no code implementations • 7 Dec 2018 • Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee

In this short note, we consider the problem of solving a min-max zero-sum game.

Paper
Add Code

Provably Correct Automatic Sub-Differentiation for Qualified Programs

no code implementations • NeurIPS 2018 • Sham M. Kakade, Jason D. Lee

The \emph{Cheap Gradient Principle}~\citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures.

Paper
Add Code

Gradient Descent Finds Global Minima of Deep Neural Networks

no code implementations • 9 Nov 2018 • Simon S. Du, Jason D. Lee, Haochuan Li, Li-Wei Wang, Xiyu Zhai

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex.

Paper
Add Code

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

no code implementations • NeurIPS 2019 • Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

Paper
Add Code

Provably Correct Automatic Subdifferentiation for Qualified Programs

no code implementations • 23 Sep 2018 • Sham Kakade, Jason D. Lee

The Cheap Gradient Principle (Griewank 2008) --- the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of $5$) as that of simply computing the function itself --- is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures.

Paper
Add Code

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

no code implementations • NeurIPS 2018 • Simon S. Du, Wei Hu, Jason D. Lee

Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization.

Paper
Add Code

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations • NeurIPS 2018 • Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

Binary Classification General Classification

Paper
Add Code

Stochastic subgradient method converges on tame functions

1 code implementation • 20 Apr 2018 • Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee

This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity?

Paper
Code

Convergence of Gradient Descent on Separable Data

no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

Paper
Add Code

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

1 code implementation • ICML 2018 • Simon S. Du, Jason D. Lee

We provide new theoretical insights on why over-parametrization is effective in learning neural networks.

Paper
Code

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

no code implementations • NeurIPS 2018 • Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason D. Lee

A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions.

Paper
Add Code

Better Generalization by Efficient Trust Region Method

no code implementations • ICLR 2018 • Xuanqing Liu, Jason D. Lee, Cho-Jui Hsieh

Solving this subproblem is non-trivial---existing methods have only sub-linear convergence rate.

Paper
Add Code

No Spurious Local Minima in a Two Hidden Unit ReLU Network

no code implementations • ICLR 2018 • Chenwei Wu, Jiajun Luo, Jason D. Lee

Deep learning models can be efficiently optimized via stochastic gradient descent, but there is little theoretical evidence to support this.

Vocal Bursts Valence Prediction

Paper
Add Code

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

no code implementations • ICML 2018 • Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh

We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i. e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned.

Paper
Add Code

Learning One-hidden-layer Neural Networks with Landscape Design

no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma

All global minima of $G$ correspond to the ground truth parameters.

Paper
Add Code

First-order Methods Almost Always Avoid Saddle Points

no code implementations • 20 Oct 2017 • Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We establish that first-order methods avoid saddle points for almost all initializations.

Paper
Add Code

When is a Convolutional Filter Easy To Learn?

no code implementations • ICLR 2018 • Simon S. Du, Jason D. Lee, Yuandong Tian

We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches.

Paper
Add Code

An inexact subsampled proximal Newton-type method for large-scale machine learning

no code implementations • 28 Aug 2017 • Xuanqing Liu, Cho-Jui Hsieh, Jason D. Lee, Yuekai Sun

We propose a fast proximal Newton-type algorithm for minimizing regularized finite sums that returns an $\epsilon$-suboptimal point in $\tilde{\mathcal{O}}(d(n + \sqrt{\kappa d})\log(\frac{1}{\epsilon}))$ FLOPS, where $n$ is number of samples, $d$ is feature dimension, and $\kappa$ is the condition number.

BIG-bench Machine Learning

Paper
Add Code

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

no code implementations • 16 Jul 2017 • Mahdi Soltanolkotabi, Adel Javanmard, Jason D. Lee

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set.

Paper
Add Code

Gradient Descent Can Take Exponential Time to Escape Saddle Points

no code implementations • NeurIPS 2017 • Simon S. Du, Chi Jin, Jason D. Lee, Michael. I. Jordan, Barnabas Poczos, Aarti Singh

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.

Paper
Add Code

A Flexible Framework for Hypothesis Testing in High-dimensions

no code implementations • 26 Apr 2017 • Adel Javanmard, Jason D. Lee

By duality between hypotheses testing and confidence intervals, the proposed framework can be used to obtain valid confidence intervals for various functionals of the model parameters.

regression Two-sample testing +2

Paper
Add Code

Statistical Inference for Model Parameters in Stochastic Gradient Descent

no code implementations • 27 Oct 2016 • Xi Chen, Jason D. Lee, Xin T. Tong, Yichen Zhang

Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal.

regression

Paper
Add Code

Black-box Importance Sampling

no code implementations • 17 Oct 2016 • Qiang Liu, Jason D. Lee

Importance sampling is widely used in machine learning and statistics, but its power is limited by the restriction of using simple proposals for which the importance weights can be tractably calculated.

BIG-bench Machine Learning

Paper
Add Code

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

no code implementations • 10 Oct 2016 • Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro

Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data.

Paper
Add Code

Communication-Efficient Distributed Statistical Inference

no code implementations • 25 May 2016 • Michael. I. Jordan, Jason D. Lee, Yun Yang

CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference.

Bayesian Inference Computational Efficiency

Paper
Add Code

Matrix Completion has No Spurious Local Minimum

no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

Collaborative Filtering Matrix Completion +1

Paper
Add Code

Gradient Descent Converges to Minimizers

no code implementations • 16 Feb 2016 • Jason D. Lee, Max Simchowitz, Michael. I. Jordan, Benjamin Recht

We show that gradient descent converges to a local minimizer, almost surely with random initialization.

Paper
Add Code

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation

no code implementations • 10 Feb 2016 • Qiang Liu, Jason D. Lee, Michael. I. Jordan

We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory.

Paper
Add Code

Evaluating the statistical significance of biclusters

no code implementations • NeurIPS 2015 • Jason D. Lee, Yuekai Sun, Jonathan E. Taylor

Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of high-dimensional data.

Paper
Add Code

Learning Halfspaces and Neural Networks with Random Initialization

no code implementations • 25 Nov 2015 • Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael. I. Jordan

For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $\epsilon>0$.

Paper
Add Code

$\ell_1$-regularized Neural Networks are Improperly Learnable in Polynomial Time

no code implementations • 13 Oct 2015 • Yuchen Zhang, Jason D. Lee, Michael. I. Jordan

The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in $(1/\epsilon,\log(1/\delta), F(k, L))$, where $F(k, L)$ is a function depending on $(k, L)$ and on the activation function, independent of the number of neurons.

Paper
Add Code

Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity

no code implementations • 27 Jul 2015 • Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang

We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper.

Distributed Optimization

Paper
Add Code

Selective Inference and Learning Mixed Graphical Models

no code implementations • 30 Jun 2015 • Jason D. Lee

We present the Condition-on-Selection method that allows for valid selective inference, and study its application to the lasso, and several other selection algorithms.

Model Selection valid

Paper
Add Code

Communication-efficient sparse regression: a one-shot approach

no code implementations • 14 Mar 2015 • Jason D. Lee, Yuekai Sun, Qiang Liu, Jonathan E. Taylor

We devise a one-shot approach to distributed sparse regression in the high-dimensional setting.

regression

Paper
Add Code

Scalable methods for nonnegative matrix factorizations of near-separable tall-and-skinny matrices

1 code implementation • NeurIPS 2014 • Austin R. Benson, Jason D. Lee, Bartek Rajwa, David F. Gleich

We demonstrate the efficacy of these algorithms on terabyte-sized synthetic matrices and real-world matrices from scientific computing and bioinformatics.

Paper
Code

Exact Post Model Selection Inference for Marginal Screening

no code implementations • NeurIPS 2014 • Jason D. Lee, Jonathan E. Taylor

We develop a framework for post model selection inference, via marginal screening, in linear regression.

Model Selection regression +2

Paper
Add Code

On model selection consistency of penalized M-estimators: a geometric theory

no code implementations • NeurIPS 2013 • Jason D. Lee, Yuekai Sun, Jonathan E. Taylor

Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure.

Model Selection

Paper
Add Code

Using Multiple Samples to Learn Mixture Models

no code implementations • NeurIPS 2013 • Jason D. Lee, Ran Gilad-Bachrach, Rich Caruana

In the mixture models problem it is assumed that there are $K$ distributions $\theta_{1},\ldots,\theta_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients.

Paper
Add Code

Exact post-selection inference, with application to the lasso

no code implementations • 25 Nov 2013 • Jason D. Lee, Dennis L. Sun, Yuekai Sun, Jonathan E. Taylor

We develop a general approach to valid inference after model selection.

Model Selection valid

Paper
Add Code

On model selection consistency of regularized M-estimators

no code implementations • 31 May 2013 • Jason D. Lee, Yuekai Sun, Jonathan E. Taylor

Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure.

Model Selection

Paper
Add Code

Proximal Newton-type methods for minimizing composite functions

1 code implementation • 7 Jun 2012 • Jason D. Lee, Yuekai Sun, Michael A. Saunders

We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping.

Vocal Bursts Type Prediction

Paper
Code

Learning Mixed Graphical Models

no code implementations • 22 May 2012 • Jason D. Lee, Trevor J. Hastie

We present a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning.

Paper
Add Code

Practical Large-Scale Optimization for Max-norm Regularization

no code implementations • NeurIPS 2010 • Jason D. Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan R. Salakhutdinov

The max-norm was proposed as a convex matrix regularizer by Srebro et al (2004) and was shown to be empirically superior to the trace-norm for collaborative filtering problems.

Clustering Collaborative Filtering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.