Search Results for author: Atsushi Nitanda

Found 26 papers, 2 papers with code

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

no code implementations • 12 Jun 2023 • Taiji Suzuki, Denny Wu, Atsushi Nitanda

Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Paper
Add Code

Tight and fast generalization error bound of graph embedding in metric space

no code implementations • 13 May 2023 • Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi

However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance.

Graph Embedding

Paper
Add Code

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

no code implementations • 6 Mar 2023 • Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.

Image Generation

Paper
Add Code

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

no code implementations • 18 Feb 2023 • Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda

Stochastic gradient descent is a workhorse for training deep neural networks due to its excellent generalization performance.

Paper
Add Code

Koopman-based generalization bound: New aspect for full-rank weights

no code implementations • 12 Feb 2023 • Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small.

Paper
Add Code

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations • 25 Jan 2022 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

Paper
Add Code

Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic

no code implementations • NeurIPS 2021 • Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Kenji Yamanishi, Marc Cavazza

Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property.

Generalization Bounds Graph Embedding

Paper
Add Code

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations • ICLR 2022 • Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

Paper
Add Code

Generalization Error Bound for Hyperbolic Ordinal Embedding

no code implementations • 21 May 2021 • Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Marc Cavazza, Kenji Yamanishi

Hyperbolic ordinal embedding (HOE) represents entities as points in hyperbolic space so that they agree as well as possible with given constraints in the form of entity i is more similar to entity j than to entity k. It has been experimentally shown that HOE can obtain representations of hierarchical data such as a knowledge base and a citation network effectively, owing to hyperbolic space's exponential growth property.

Paper
Add Code

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Paper
Add Code

BODAME: Bilevel Optimization for Defense Against Model Extraction

no code implementations • 11 Mar 2021 • Yuto Mori, Atsushi Nitanda, Akiko Takeda

Model extraction attacks have become serious issues for service providers using machine learning.

Bilevel Optimization Model extraction

Paper
Add Code

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

Paper
Add Code

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

no code implementations • 31 Jul 2020 • Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi

In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification.

Binary Classification General Classification +2

Paper
Add Code

Online Robust and Adaptive Learning from Data Streams

1 code implementation • 23 Jul 2020 • Shintaro Fukushima, Atsushi Nitanda, Kenji Yamanishi

We address the relation between the two parameters: one is the step size of the stochastic approximation, and the other is the threshold parameter of the norm of the stochastic update.

Attribute

Paper
Code

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

no code implementations • ICLR 2021 • Atsushi Nitanda, Taiji Suzuki

In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.

Paper
Add Code

When Does Preconditioning Help or Hurt Generalization?

no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Paper
Add Code

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

no code implementations • 13 Nov 2019 • Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.

Binary Classification Classification +1

Paper
Add Code

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

no code implementations • NeurIPS 2021 • Taiji Suzuki, Atsushi Nitanda

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Paper
Add Code

Data Cleansing for Models Trained with SGD

1 code implementation • NeurIPS 2019 • Satoshi Hara, Atsushi Nitanda, Takanori Maehara

Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models.

BIG-bench Machine Learning

Paper
Code

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

no code implementations • 23 May 2019 • Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.

General Classification Generalization Bounds

Paper
Add Code

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

no code implementations • 14 Jun 2018 • Atsushi Nitanda, Taiji Suzuki

In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.

Binary Classification Classification +1

Paper
Add Code

Functional Gradient Boosting based on Residual Network Perception

no code implementations • ICML 2018 • Atsushi Nitanda, Taiji Suzuki

Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.

Paper
Add Code

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

no code implementations • 7 Jan 2018 • Atsushi Nitanda, Taiji Suzuki

In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.

Paper
Add Code

Stochastic Particle Gradient Descent for Infinite Ensembles

no code implementations • 14 Dec 2017 • Atsushi Nitanda, Taiji Suzuki

The superior performance of ensemble methods with infinite models are well known.

Ensemble Learning Stochastic Optimization

Paper
Add Code

Accelerated Stochastic Gradient Descent for Minimizing Finite Sums

no code implementations • 9 Jun 2015 • Atsushi Nitanda

We propose an optimization method for minimizing the finite sums of smooth convex functions.

Paper
Add Code

Stochastic Proximal Gradient Descent with Acceleration Techniques

no code implementations • NeurIPS 2014 • Atsushi Nitanda

Accelerated proximal gradient descent (APG) and proximal stochastic variance reduction gradient (Prox-SVRG) are in a trade-off relationship.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.