Search Results for author: Stanley J. Osher

Found 24 papers, 13 papers with code

Wasserstein proximal operators describe score-based generative models and resolve memorization

no code implementations • 9 Feb 2024 • Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher

Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation.

Inductive Bias Memorization

Paper
Add Code

PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation Laws

1 code implementation • 14 Jan 2024 • Liu Yang, Stanley J. Osher

We show the positive evidence to the second question, i. e., ICON can generalize well to some PDEs with new forms without any fine-tuning.

Operator learning

Paper
Code

Fine-Tune Language Models as Multi-Modal Differential Equation Solvers

1 code implementation • 9 Aug 2023 • Liu Yang, Siting Liu, Stanley J. Osher

In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations using prompted data, during the inference stage without weight updates.

Efficient Neural Network Language Modelling +1

Paper
Code

In-Context Operator Learning with Data Prompts for Differential Equation Problems

2 code implementations • 17 Apr 2023 • Liu Yang, Siting Liu, Tingwei Meng, Stanley J. Osher

This paper introduces a new neural-network-based approach, namely In-Context Operator Networks (ICON), to simultaneously learn operators from the prompted data and apply it to new questions during the inference stage, without any weight update.

Operator learning

Paper
Code

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

no code implementations • 1 Aug 2022 • Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence.

Image Generation Machine Translation

Paper
Add Code

Transformer with Fourier Integral Attentions

no code implementations • 1 Jun 2022 • Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J. Osher, Nhat Ho

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond.

Image Classification Language Modelling +1

Paper
Add Code

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

no code implementations • 19 Apr 2022 • Justin Baker, Hedi Xia, Yiwei Wang, Elena Cherkaev, Akil Narayan, Long Chen, Jack Xin, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers.

Computational Efficiency

Paper
Add Code

Parameter Inference of Time Series by Delay Embeddings and Learning Differentiable Operators

no code implementations • 11 Mar 2022 • Alex Tong Lin, Adrian S. Wong, Robert Martin, Stanley J. Osher, Daniel Eckhardt

We provide a method to identify system parameters of dynamical systems, called ID-ODE -- Inference by Differentiation and Observing Delay Embeddings.

Time Series Time Series Analysis

Paper
Add Code

Improving Transformers with Probabilistic Attention Keys

1 code implementation • 16 Oct 2021 • Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head.

Language Modelling

Paper
Code

Heavy Ball Neural Ordinary Differential Equations

1 code implementation • NeurIPS 2021 • Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference.

Image Classification

Paper
Code

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

no code implementations • NeurIPS 2021 • Tan M. Nguyen, Vai Suliafu, Stanley J. Osher, Long Chen, Bao Wang

For instance, FMMformers achieve an average classification accuracy of $60. 74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58. 70\%$.

Language Modelling

Paper
Add Code

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

2 code implementations • NeurIPS 2020 • Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Designing deep neural networks is an art that often involves an expensive search over candidate architectures.

Paper
Code

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

no code implementations • 2 Mar 2020 • Thu Dinh, Bao Wang, Andrea L. Bertozzi, Stanley J. Osher

In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning.

Paper
Add Code

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

1 code implementation • 24 Feb 2020 • Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst.

General Classification Image Classification

Paper
Code

Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games

1 code implementation • 24 Feb 2020 • Alex Tong Lin, Samy Wu Fung, Wuchen Li, Levon Nurbekyan, Stanley J. Osher

By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial network (GAN).

Generative Adversarial Network

Paper
Code

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

1 code implementation • 16 Jul 2019 • Bao Wang, Stanley J. Osher

The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning.

Paper
Code

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

1 code implementation • 28 Jun 2019 • Bao Wang, Quanquan Gu, March Boedihardjo, Farzin Barekat, Stanley J. Osher

At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism.

Privacy Preserving Stochastic Optimization

Paper
Code

A Deterministic Gradient-Based Approach to Avoid Saddle Points

no code implementations • 21 Jan 2019 • Lisa Maria Kreusser, Stanley J. Osher, Bao Wang

First-order methods such as gradient descent are usually the methods of choice for training machine learning models.

BIG-bench Machine Learning

Paper
Add Code

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

5 code implementations • NeurIPS 2019 • Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher

However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory.

Adversarial Attack Adversarial Defense

Paper
Code

Mathematical Analysis of Adversarial Attacks

no code implementations • 15 Nov 2018 • Zehao Dou, Stanley J. Osher, Bao Wang

In this paper, we analyze efficacy of the fast gradient sign method (FGSM) and the Carlini-Wagner's L2 (CW-L2) attack.

General Classification

Paper
Add Code

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

1 code implementation • 23 Sep 2018 • Bao Wang, Alex T. Lin, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley J. Osher

We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation.

Adversarial Attack Adversarial Defense +1

Paper
Code

Deep Neural Nets with Interpolating Function as Output Activation

1 code implementation • NeurIPS 2018 • Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley J. Osher

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function.

Paper
Code

Deep Learning for Real-Time Crime Forecasting and its Ternarization

no code implementations • 23 Nov 2017 • Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin

In this work, we first present a proper representation of crime data.

Paper
Add Code

Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

no code implementations • 26 Jul 2012 • Braxton Osting, Christoph Brune, Stanley J. Osher

Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking.

Clustering Experimental Design +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.