Search Results for author: Stanley J. Osher

Found 24 papers, 13 papers with code

Wasserstein proximal operators describe score-based generative models and resolve memorization

no code implementations9 Feb 2024 Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher

Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation.

Inductive Bias Memorization

PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation Laws

1 code implementation14 Jan 2024 Liu Yang, Stanley J. Osher

We show the positive evidence to the second question, i. e., ICON can generalize well to some PDEs with new forms without any fine-tuning.

Operator learning

Fine-Tune Language Models as Multi-Modal Differential Equation Solvers

1 code implementation9 Aug 2023 Liu Yang, Siting Liu, Stanley J. Osher

In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations using prompted data, during the inference stage without weight updates.

Efficient Neural Network Language Modelling +1

In-Context Operator Learning with Data Prompts for Differential Equation Problems

2 code implementations17 Apr 2023 Liu Yang, Siting Liu, Tingwei Meng, Stanley J. Osher

This paper introduces a new neural-network-based approach, namely In-Context Operator Networks (ICON), to simultaneously learn operators from the prompted data and apply it to new questions during the inference stage, without any weight update.

Operator learning

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

no code implementations1 Aug 2022 Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence.

Image Generation Machine Translation

Transformer with Fourier Integral Attentions

no code implementations1 Jun 2022 Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J. Osher, Nhat Ho

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond.

Image Classification Language Modelling +1

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

no code implementations19 Apr 2022 Justin Baker, Hedi Xia, Yiwei Wang, Elena Cherkaev, Akil Narayan, Long Chen, Jack Xin, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers.

Computational Efficiency

Parameter Inference of Time Series by Delay Embeddings and Learning Differentiable Operators

no code implementations11 Mar 2022 Alex Tong Lin, Adrian S. Wong, Robert Martin, Stanley J. Osher, Daniel Eckhardt

We provide a method to identify system parameters of dynamical systems, called ID-ODE -- Inference by Differentiation and Observing Delay Embeddings.

Time Series Time Series Analysis

Improving Transformers with Probabilistic Attention Keys

1 code implementation16 Oct 2021 Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head.

Language Modelling

Heavy Ball Neural Ordinary Differential Equations

1 code implementation NeurIPS 2021 Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference.

Image Classification

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

no code implementations NeurIPS 2021 Tan M. Nguyen, Vai Suliafu, Stanley J. Osher, Long Chen, Bao Wang

For instance, FMMformers achieve an average classification accuracy of $60. 74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58. 70\%$.

Language Modelling

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

2 code implementations NeurIPS 2020 Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Designing deep neural networks is an art that often involves an expensive search over candidate architectures.

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

no code implementations2 Mar 2020 Thu Dinh, Bao Wang, Andrea L. Bertozzi, Stanley J. Osher

In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning.

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

1 code implementation24 Feb 2020 Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst.

General Classification Image Classification

Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games

1 code implementation24 Feb 2020 Alex Tong Lin, Samy Wu Fung, Wuchen Li, Levon Nurbekyan, Stanley J. Osher

By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial network (GAN).

Generative Adversarial Network

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

1 code implementation16 Jul 2019 Bao Wang, Stanley J. Osher

The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning.

A Deterministic Gradient-Based Approach to Avoid Saddle Points

no code implementations21 Jan 2019 Lisa Maria Kreusser, Stanley J. Osher, Bao Wang

First-order methods such as gradient descent are usually the methods of choice for training machine learning models.

BIG-bench Machine Learning

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

5 code implementations NeurIPS 2019 Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher

However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory.

Adversarial Attack Adversarial Defense

Mathematical Analysis of Adversarial Attacks

no code implementations15 Nov 2018 Zehao Dou, Stanley J. Osher, Bao Wang

In this paper, we analyze efficacy of the fast gradient sign method (FGSM) and the Carlini-Wagner's L2 (CW-L2) attack.

General Classification

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

1 code implementation23 Sep 2018 Bao Wang, Alex T. Lin, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley J. Osher

We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation.

Adversarial Attack Adversarial Defense +1

Deep Neural Nets with Interpolating Function as Output Activation

1 code implementation NeurIPS 2018 Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley J. Osher

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function.

Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

no code implementations26 Jul 2012 Braxton Osting, Christoph Brune, Stanley J. Osher

Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking.

Clustering Experimental Design +2

Cannot find the paper you are looking for? You can Submit a new open access paper.