no code implementations • 12 Jun 2023 • Taiji Suzuki, Denny Wu, Atsushi Nitanda
Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.
no code implementations • 13 May 2023 • Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi
However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance.
no code implementations • 6 Mar 2023 • Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.
no code implementations • 18 Feb 2023 • Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda
Stochastic gradient descent is a workhorse for training deep neural networks due to its excellent generalization performance.
no code implementations • 12 Feb 2023 • Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki
Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small.
no code implementations • 25 Jan 2022 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.
no code implementations • NeurIPS 2021 • Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Kenji Yamanishi, Marc Cavazza
Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property.
no code implementations • ICLR 2022 • Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu
We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.
no code implementations • 21 May 2021 • Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Marc Cavazza, Kenji Yamanishi
Hyperbolic ordinal embedding (HOE) represents entities as points in hyperbolic space so that they agree as well as possible with given constraints in the form of entity i is more similar to entity j than to entity k. It has been experimentally shown that HOE can obtain representations of hierarchical data such as a knowledge base and a citation network effectively, owing to hyperbolic space's exponential growth property.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • 11 Mar 2021 • Yuto Mori, Atsushi Nitanda, Akiko Takeda
Model extraction attacks have become serious issues for service providers using machine learning.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • 31 Jul 2020 • Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi
In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification.
1 code implementation • 23 Jul 2020 • Shintaro Fukushima, Atsushi Nitanda, Kenji Yamanishi
We address the relation between the two parameters: one is the step size of the stochastic approximation, and the other is the threshold parameter of the norm of the stochastic update.
no code implementations • ICLR 2021 • Atsushi Nitanda, Taiji Suzuki
In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.
no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.
no code implementations • 13 Nov 2019 • Shingo Yashima, Atsushi Nitanda, Taiji Suzuki
To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.
no code implementations • NeurIPS 2021 • Taiji Suzuki, Atsushi Nitanda
The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.
1 code implementation • NeurIPS 2019 • Satoshi Hara, Atsushi Nitanda, Takanori Maehara
Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models.
no code implementations • 23 May 2019 • Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki
Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.
no code implementations • 14 Jun 2018 • Atsushi Nitanda, Taiji Suzuki
In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.
no code implementations • ICML 2018 • Atsushi Nitanda, Taiji Suzuki
Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.
no code implementations • 7 Jan 2018 • Atsushi Nitanda, Taiji Suzuki
In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.
no code implementations • 14 Dec 2017 • Atsushi Nitanda, Taiji Suzuki
The superior performance of ensemble methods with infinite models are well known.
no code implementations • 9 Jun 2015 • Atsushi Nitanda
We propose an optimization method for minimizing the finite sums of smooth convex functions.
no code implementations • NeurIPS 2014 • Atsushi Nitanda
Accelerated proximal gradient descent (APG) and proximal stochastic variance reduction gradient (Prox-SVRG) are in a trade-off relationship.