Search Results for author: Michael Mahoney

Found 14 papers, 7 papers with code

Error Estimation for Sketched SVD

no code implementations • ICML 2020 • Miles Lopes, N. Benjamin Erichson, Michael Mahoney

In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach.

Paper
Add Code

What’s Hidden in a One-layer Randomly Weighted Transformer?

1 code implementation • EMNLP 2021 • Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney

Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.

Machine Translation Translation

Paper
Code

GACT: Activation Compressed Training for Generic Network Architectures

1 code implementation • 22 Jun 2022 • Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.

Paper
Code

AutoIP: A United Framework to Integrate Physics into Gaussian Processes

1 code implementation • 24 Feb 2022 • Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, Michael Mahoney

Physical modeling is critical for many modern science and engineering applications.

Gaussian Processes Uncertainty Quantification

Paper
Code

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

no code implementations • 16 May 2021 • Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master.

Distributed Optimization

Paper
Add Code

PyHessian: Neural Networks Through the Lens of the Hessian

2 code implementations • 16 Dec 2019 • Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks.

629

Paper
Code

ANODEV2: A Coupled Neural ODE Evolution Framework

no code implementations • 10 Jun 2019 • Tianjun Zhang, Zhewei Yao, Amir Gholami, Kurt Keutzer, Joseph Gonzalez, George Biros, Michael Mahoney

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE).

Paper
Add Code

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

1 code implementation • ICCV 2019 • Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision.

Quantization

395

Paper
Code

Trust Region Based Adversarial Attack on Neural Networks

2 code implementations • CVPR 2019 • Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael Mahoney

To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently.

Adversarial Attack

134

Paper
Code

Parameter Re-Initialization through Cyclical Batch Size Schedules

no code implementations • 4 Dec 2018 • Norman Mu, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

We demonstrate the ability of our method to improve language modeling performance by up to 7. 91 perplexity and reduce training iterations by up to $61\%$, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.

Ranked #51 on Natural Language Inference on SNLI

General Classification Image Classification +2

Paper
Add Code

Large batch size training of neural networks with adversarial training and second-order information

1 code implementation • ICLR 2019 • Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney

Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1\% and $5\times$, respectively).

Second-order methods

Paper
Code

Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares -- ICML

no code implementations • 25 May 2015 • Garvesh Raskutti, Michael Mahoney

We then consider the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator; and we use our framework to provide upper bounds for several types of random projection and random sampling algorithms.

Paper
Add Code

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

no code implementations • 29 Dec 2014 • Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael Mahoney

These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e. g., Gaussian kernel).

Paper
Add Code

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

no code implementations • 23 Jun 2014 • Garvesh Raskutti, Michael Mahoney

Prior results show that, when using sketching matrices such as random projections and leverage-score sampling algorithms, with $p < r \ll n$, the WC error is the same as solving the original problem, up to a small constant.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.