no code implementations • 11 Apr 2024 • Yunxiang Li, Rui Yuan, Chen Fan, Mark Schmidt, Samuel Horváth, Robert M. Gower, Martin Takáč
Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL).
no code implementations • 3 Apr 2024 • Aaron Mishkin, Mert Pilanci, Mark Schmidt
This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.
no code implementations • 29 Feb 2024 • Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti
We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics.
no code implementations • 3 Jul 2023 • Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She
We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.
1 code implementation • 27 Apr 2023 • Frederik Kunstner, Jacques Chen, Jonathan Wilder Lavington, Mark Schmidt
This suggests that Adam outperform SGD because it uses a more robust gradient estimate.
no code implementations • 2 Apr 2023 • Chen Fan, Christos Thrampoulidis, Mark Schmidt
Modern machine learning models are often over-parameterized and as a result they can interpolate the training data.
1 code implementation • 20 Feb 2023 • Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations.
1 code implementation • 6 Feb 2023 • Jonathan Wilder Lavington, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Nicolas Le Roux
Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e. g. the logits output by a linear model for classification) that can be minimized efficiently.
1 code implementation • 29 Jul 2022 • Jonathan Wilder Lavington, Sharan Vaswani, Mark Schmidt
Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret.
no code implementations • 22 Jul 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.
no code implementations • 18 Feb 2021 • Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate.
no code implementations • 15 Feb 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.
no code implementations • 11 Jan 2021 • Sedigheh Zolaktaf, Frits Dannenberg, Mark Schmidt, Anne Condon, Erik Winfree
We then compare the performance of pathway elaboration with the stochastic simulation algorithm (SSA) for MFPT estimation on 237 of the reactions for which SSA is feasible.
1 code implementation • 31 Dec 2020 • Andrew Warrington, J. Wilder Lavington, Adam Ścibior, Mark Schmidt, Frank Wood
Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes.
no code implementations • 2 Nov 2020 • Frederik Kunstner, Raunak Kumar, Mark Schmidt
In this work we first show that for the common setting of exponential family distributions, viewing EM as a mirror descent algorithm leads to convergence rates in Kullback-Leibler (KL) divergence.
no code implementations • NeurIPS 2020 • Yihan Zhou, Victor S. Portella, Mark Schmidt, Nicholas J. A. Harvey
We extend the known regret bounds for classical OCO algorithms to the relative setting.
no code implementations • 2 Oct 2020 • Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik
Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.
no code implementations • 28 Sep 2020 • Sharan Vaswani, Issam H. Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien
Under an interpolation assumption, we prove that AMSGrad with a constant step-size and momentum can converge to the minimizer at the faster $O(1/T)$ rate for smooth, convex functions.
1 code implementation • 11 Jun 2020 • Sharan Vaswani, Issam Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien
In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate.
1 code implementation • ICML 2020 • Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan
The Bayesian learning rule is a natural-gradient variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms.
1 code implementation • 29 Oct 2019 • Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Our generalization enables us to establish a connection between Stein's lemma and the reparamterization trick to derive gradients of expectations of a large class of functions under weak assumptions.
1 code implementation • 11 Oct 2019 • Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien
Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.
1 code implementation • 8 Jul 2019 • Frederik Hauser, Mark Schmidt, Michael Menth
If the user is permitted to use the RAC on a managed host, launching the RAC is authorized and access to protected network resources may be given, e. g., to internal networks, servers, or the Internet.
Networking and Internet Architecture Cryptography and Security
1 code implementation • 2 Jul 2019 • Issam H. Laradji, David Vazquez, Mark Schmidt
A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective.
Ranked #7 on Image-level Supervised Instance Segmentation on PASCAL VOC 2012 val (using extra training data)
Image-level Supervised Instance Segmentation Semantic Segmentation
no code implementations • 14 Jun 2019 • Issam H. Laradji, Negar Rostamzadeh, Pedro O. Pinheiro, David Vazquez, Mark Schmidt
Instance segmentation methods often require costly per-pixel labels.
1 code implementation • 7 Jun 2019 • Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations.
1 code implementation • NeurIPS 2019 • Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien
To improve the proposed methods' practical performance, we give heuristics to use larger step-sizes and acceleration.
1 code implementation • 16 May 2019 • Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim
The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data.
no code implementations • 20 Mar 2019 • Mehrdad Ghadiri, Mark Schmidt
In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function.
2 code implementations • NeurIPS 2018 • Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan
Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution.
no code implementations • 16 Oct 2018 • Sharan Vaswani, Francis Bach, Mark Schmidt
Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.
no code implementations • 10 Oct 2018 • Mohamed Osama Ahmed, Sharan Vaswani, Mark Schmidt
Indeed, in a particular setting, we prove that using the Lipschitz information yields the same or a better bound on the regret compared to using Bayesian optimization on its own.
3 code implementations • 13 Sep 2018 • Alireza Shafaei, Mark Schmidt, James J. Little
What makes this problem different from a typical supervised learning setting is that the distribution of outliers used in training may not be the same as the distribution of outliers encountered in the application.
3 code implementations • ECCV 2018 • Issam H. Laradji, Negar Rostamzadeh, Pedro O. Pinheiro, David Vazquez, Mark Schmidt
However, we propose a detection-based method that does not need to estimate the size and shape of the objects and that outperforms regression-based methods.
Ranked #1 on Object Counting on Pascal VOC 2007 count-test
no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori
We investigate the use of bootstrapping in the bandit setting.
1 code implementation • 23 Dec 2017 • Julie Nutini, Issam Laradji, Mark Schmidt
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure.
Optimization and Control 90C06
3 code implementations • ICLR 2018 • Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice.
no code implementations • 7 Mar 2017 • Sharan Vaswani, Mark Schmidt, Laks. V. S. Lakshmanan
The gang of bandits (GOB) model \cite{cesa2013gang} is a recent contextual bandits framework that shares information between a set of bandit problems, related by a known (possibly noisy) graph.
no code implementations • ICML 2017 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt
We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.
6 code implementations • 13 Dec 2016 • Tian Qi Chen, Mark Schmidt
This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.
no code implementations • 16 Aug 2016 • Hamed Karimi, Julie Nutini, Mark Schmidt
In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent.
no code implementations • 5 Aug 2016 • Alireza Shafaei, James J. Little, Mark Schmidt
We present experiments assessing the effectiveness on real-world data of systems trained on synthetic RGB images that are extracted from a video game.
no code implementations • NeurIPS 2015 • Reza Harikandeh, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen
We present and analyze several strategies for improving the performance ofstochastic variance-reduced gradient (SVRG) methods.
no code implementations • 5 Nov 2015 • Reza Babanezhad, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods.
no code implementations • 31 Oct 2015 • Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama
We also give a convergence-rate analysis of our method and many other previous methods which exploit the geometry of the space.
no code implementations • 1 Jun 2015 • Julie Nutini, Mark Schmidt, Issam H. Laradji, Michael Friedlander, Hoyt Koepke
There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J.
no code implementations • 16 Apr 2015 • Mark Schmidt, Reza Babanezhad, Mohamed Osama Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar
We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs).
no code implementations • 27 Feb 2015 • Sharan Vaswani, Laks. V. S. Lakshmanan, Mark Schmidt
We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to.
no code implementations • 6 Feb 2015 • Guang-Tong Zhou, Sung Ju Hwang, Mark Schmidt, Leonid Sigal, Greg Mori
We present a hierarchical maximum-margin clustering method for unsupervised data analysis.
no code implementations • 4 Nov 2014 • Volkan Cevher, Stephen Becker, Mark Schmidt
This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce the computational, storage, and communications bottlenecks.
2 code implementations • 10 Sep 2013 • Mark Schmidt, Nicolas Le Roux, Francis Bach
Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations.
no code implementations • NeurIPS 2012 • Nicolas L. Roux, Mark Schmidt, Francis R. Bach
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex.
no code implementations • NeurIPS 2011 • Mark Schmidt, Nicolas L. Roux, Francis R. Bach
We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the second term.
no code implementations • NeurIPS 2008 • Peter Carbonetto, Mark Schmidt, Nando D. Freitas
The stochastic approximation method is behind the solution to many important, actively-studied problems in machine learning.