Search Results for author: Hans De Sterck

Found 11 papers, 6 papers with code

First-order PDES for Graph Neural Networks: Advection And Burgers Equation Models

no code implementations3 Apr 2024 Yifan Qu, Oliver Krzysik, Hans De Sterck, Omer Ege Kara

Graph Neural Networks (GNNs) have established themselves as the preferred methodology in a multitude of domains, ranging from computer vision to computational biology, especially in contexts where data inherently conform to graph structures.

Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences

no code implementations18 Oct 2023 Yanming Kang, Giang Tran, Hans De Sterck

The overall complexity of Fast Multipole Attention is $\mathcal{O}(n)$ or $\mathcal{O}(n \log n)$, depending on whether the queries are down-sampled or not.

Language Modelling

Downlink Compression Improves TopK Sparsification

no code implementations30 Sep 2022 William Zou, Hans De Sterck, Jun Liu

One of the largest bottlenecks in distributed training is communicating gradients across different nodes.

Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees

1 code implementation4 Jun 2022 Ruikun Zhou, Thanin Quartz, Hans De Sterck, Jun Liu

This paper proposes a learning framework to simultaneously stabilize an unknown nonlinear system with a neural controller and learn a neural Lyapunov function to certify a region of attraction (ROA) for the closed-loop system.

valid

Anderson Acceleration as a Krylov Method with Application to Asymptotic Convergence Analysis

no code implementations29 Sep 2021 Hans De Sterck, Yunhui He, Oliver A. Krzysik

As a roadway towards gaining more understanding of convergence acceleration by AA, we study AA($m$), i. e., Anderson acceleration with finite window size $m$, applied to the case of linear fixed-point iterations $x_{k+1}=M x_{k}+b$.

Linear Asymptotic Convergence of Anderson Acceleration: Fixed-Point Analysis

no code implementations29 Sep 2021 Hans De Sterck, Yunhui He

However, we show that, despite the discontinuity of $\beta(z)$, the iteration function $\Psi(z)$ is Lipschitz continuous and directionally differentiable at $z^*$ for AA(1), and we generalize this to AA($m$) with $m>1$ for most cases.

N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations

1 code implementation22 Oct 2020 Aaron Baier-Reinio, Hans De Sterck

We use neural ordinary differential equations to formulate a variant of the Transformer that is depth-adaptive in the sense that an input-dependent number of time steps is taken by the ordinary differential equation solver.

Machine Translation

On the Asymptotic Linear Convergence Speed of Anderson Acceleration Applied to ADMM

1 code implementation6 Jul 2020 Da-Wei Wang, Yunhui He, Hans De Sterck

In this paper we explain and quantify this improvement in linear asymptotic convergence speed for the special case of a stationary version of AA applied to ADMM.

On the Asymptotic Linear Convergence Speed of Anderson Acceleration, Nesterov Acceleration, and Nonlinear GMRES

1 code implementation4 Jul 2020 Hans De Sterck, Yunhui He

Since AA and NGMRES are equivalent to GMRES in the linear case, one may expect the GMRES convergence factors to be relevant for AA and NGMRES as $x_k \rightarrow x^*$.

Tensor Decomposition

Nesterov Acceleration of Alternating Least Squares for Canonical Tensor Decomposition: Momentum Step Size Selection and Restart Mechanisms

1 code implementation13 Oct 2018 Drew Mitchell, Nan Ye, Hans De Sterck

While Nesterov acceleration turns gradient descent into an optimal first-order method for convex problems by adding a momentum term with a specific weight sequence, a direct application of this method and weight sequence to ALS results in erratic convergence behaviour.

Tensor Decomposition

Cannot find the paper you are looking for? You can Submit a new open access paper.