no code implementations • 30 Jan 2024 • Joshua Hanson, Maxim Raginsky
In this net, the output "weights" are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex.
no code implementations • 23 Jan 2024 • Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer
The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture.
no code implementations • 8 Sep 2023 • Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj, Maxim Raginsky
Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones.
no code implementations • 1 Jul 2023 • Tanya Veeravalli, Maxim Raginsky
The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture.
no code implementations • 24 May 2023 • Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, Talia Ringer
Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them.
no code implementations • NeurIPS 2023 • Yifeng Chu, Maxim Raginsky
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
no code implementations • 4 May 2023 • Yifeng Chu, Maxim Raginsky
The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes.
no code implementations • 27 Apr 2023 • Yifeng Chu, Maxim Raginsky
We obtain an upper bound on the expected supremum of a Bernoulli process indexed by the image of an index set under a uniformly Lipschitz function class in terms of properties of the index set and the function class, extending an earlier result of Maurer for Gaussian processes.
no code implementations • 16 Mar 2023 • Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach
Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function.
no code implementations • 1 Dec 2022 • Tanya Veeravalli, Maxim Raginsky
There has been a great deal of recent interest in learning and approximation of functions that can be expressed as expectations of a given nonlinearity with respect to its random internal parameters.
no code implementations • 3 Apr 2022 • Joshua Hanson, Maxim Raginsky
This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples.
no code implementations • 14 Feb 2022 • Alan Yang, Jie Xiong, Maxim Raginsky, Elyse Rosenbaum
This paper proposes a class of neural ordinary differential equations parametrized by provably input-to-state stable continuous-time recurrent neural networks.
1 code implementation • NeurIPS 2021 • Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm.
no code implementations • 29 Dec 2020 • Aolin Xu, Maxim Raginsky
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known.
no code implementations • 18 Nov 2020 • Joshua Hanson, Maxim Raginsky, Eduardo Sontag
We consider the following learning problem: Given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), we wish to find a continuous-time recurrent neural net with hyperbolic tangent activation function that approximately reproduces the underlying i/o behavior with high confidence.
no code implementations • L4DC 2020 • Joshua Hanson, Maxim Raginsky
It is well-known that continuous-time recurrent neural nets are universal approximators for continuous-time dynamical systems.
no code implementations • 24 Mar 2020 • Naci Saldi, Tamer Basar, Maxim Raginsky
In this paper, we consider discrete-time partially observed mean-field games with the risk-sensitive optimality criterion.
no code implementations • 5 Feb 2020 • Belinda Tzen, Maxim Raginsky
We first consider the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continual ensemble, and show that our problem can be phrased as global minimization of a free-energy functional on the space of probability measures over the weights.
1 code implementation • 12 Nov 2019 • Alan Yang, AmirEmad Ghassami, Maxim Raginsky, Negar Kiyavash, Elyse Rosenbaum
In the second step, CI testing is performed by applying the $k$-NN conditional mutual information estimator to the learned feature maps.
no code implementations • NeurIPS 2019 • Joshua Hanson, Maxim Raginsky
There has been a recent shift in sequence-to-sequence modeling from recurrent network architectures to convolutional network architectures due to computational advantages in training and operation while still achieving competitive performance.
no code implementations • 23 May 2019 • Belinda Tzen, Maxim Raginsky
In deep latent Gaussian models, the latent variable is generated by a time-inhomogeneous Markov chain, where at each time step we pass the current state through a parametric nonlinear map, such as a feedforward neural net, and add a small independent Gaussian perturbation.
no code implementations • 5 Mar 2019 • Belinda Tzen, Maxim Raginsky
We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion.
no code implementations • 23 Dec 2018 • Jaeho Lee, Maxim Raginsky
This paper generalizes the Maurer--Pontil framework of finite-dimensional lossy coding schemes to the setting where a high-dimensional random vector is mapped to an element of a compact set of latent representations in a lower-dimensional Euclidean space, and the reconstruction map belongs to a given class of nonlinear maps.
no code implementations • 18 Feb 2018 • Belinda Tzen, Tengyuan Liang, Maxim Raginsky
For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the $\varepsilon$-neighborhood of this particular optimum within a short recurrence time; (2) it enters this $\varepsilon$-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time.
no code implementations • NeurIPS 2017 • Aolin Xu, Maxim Raginsky
We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output.
no code implementations • NeurIPS 2018 • Jaeho Lee, Maxim Raginsky
As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data.
no code implementations • 19 May 2017 • Mehmet A. Donmez, Maxim Raginsky, Andrew C. Singer
We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori.
no code implementations • 13 Feb 2017 • Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky
Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration.
no code implementations • 31 Aug 2015 • Soomin Lee, Angelia Nedić, Maxim Raginsky
In ODA-C, to mitigate the disagreements on the primal-vector updates, the agents implement a generalization of the local information-exchange dynamics recently proposed by Li and Marden over a static undirected graph.
no code implementations • 14 Jan 2014 • Peng Guan, Maxim Raginsky, Rebecca Willett
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space.
no code implementations • 28 Oct 2013 • Peng Guan, Maxim Raginsky, Rebecca Willett
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions.
no code implementations • 1 Jul 2013 • Maxim Raginsky, Angelia Nedić
We study a model of collective real-time decision-making (or learning) in a social network operating in an uncertain environment, for which no a priori probabilistic model is available.
no code implementations • 19 Dec 2012 • Maxim Raginsky, Igal Sason
This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding.
Information Theory Information Theory Probability
no code implementations • NeurIPS 2011 • Maxim Raginsky, Alexander Rakhlin
For passive learning, our lower bounds match the upper bounds of Gine and Koltchinskii up to constants and generalize analogous results of Massart and Nedelec.
no code implementations • NeurIPS 2009 • Maxim Raginsky, Svetlana Lazebnik
This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings.
no code implementations • NeurIPS 2008 • Maxim Raginsky, Svetlana Lazebnik, Rebecca Willett, Jorge Silva
This paper describes a recursive estimation procedure for multivariate binary densities using orthogonal expansions.