Search Results for author: Kunle Olukotun

Found 20 papers, 3 papers with code

BaCO: A Fast and Portable Bayesian Compiler Optimization Framework

1 code implementation1 Dec 2022 Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, Luigi Nardi

We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs.

Compiler Optimization

Efficient Memory Partitioning in Software Defined Hardware

no code implementations2 Feb 2022 Matthew Feldman, Tian Zhao, Kunle Olukotun

As programmers turn to software-defined hardware (SDH) to maintain a high level of productivity while programming hardware to run complex algorithms, heavy-lifting must be done by the compiler to automatically partition on-chip arrays.

Prior-guided Bayesian Optimization

no code implementations28 Sep 2020 Artur Souza, Luigi Nardi, Leonardo Oliveira, Kunle Olukotun, Marius Lindauer, Frank Hutter

While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts.

Bayesian Optimization

Bayesian Optimization with a Prior for the Optimum

no code implementations25 Jun 2020 Artur Souza, Luigi Nardi, Leonardo B. Oliveira, Kunle Olukotun, Marius Lindauer, Frank Hutter

We show that BOPrO is around 6. 67x faster than state-of-the-art methods on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application.

Bayesian Optimization

Taurus: A Data Plane Architecture for Per-Packet ML

no code implementations12 Feb 2020 Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, Kunle Olukotun

Emerging applications -- cloud computing, the internet of things, and augmented/virtual reality -- demand responsive, secure, and scalable datacenter networks.

Anomaly Detection BIG-bench Machine Learning +2

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

no code implementations26 Sep 2019 Tian Zhao, Yaqi Zhang, Kunle Olukotun

Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads.

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

no code implementations24 May 2019 Rekha Singhal, Nathan Zhang, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun

Modern real-time business analytic consist of heterogeneous workloads (e. g, database queries, graph processing, and machine learning).

DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning

1 code implementation26 Apr 2019 Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi

In this paper, we introduce a new serial crystallography dataset comprised of real and synthetic images; the synthetic images are generated through the use of a simulator that is both scalable and accurate.

AutoML BIG-bench Machine Learning +1

Practical Design Space Exploration

no code implementations11 Oct 2018 Luigi Nardi, David Koeplinger, Kunle Olukotun

The proposed methodology follows a white-box model which is simple to understand and interpret (unlike, for example, neural networks) and can be used by the user to better understand the results of the automatic search.

DiffraNet: Automatic Classification of Serial Crystallography Diffraction Patterns

no code implementations27 Sep 2018 Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi

In this paper, we introduce a new serial crystallography dataset generated through the use of a simulator; the synthetic images are labeled and they are both scalable and accurate.

AutoML Classification

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

no code implementations4 Jun 2018 Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries.

Benchmarking BIG-bench Machine Learning

High-Accuracy Low-Precision Training

1 code implementation9 Mar 2018 Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.

Quantization Vocal Bursts Intensity Prediction

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

no code implementations22 May 2017 Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations.

BIG-bench Machine Learning

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

no code implementations24 Feb 2016 Christopher De Sa, Kunle Olukotun, Christopher Ré

Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

no code implementations NeurIPS 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

no code implementations22 Jun 2015 Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.

Matrix Completion

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

no code implementations5 Nov 2014 Christopher De Sa, Kunle Olukotun, Christopher Ré

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.

Matrix Completion

Utilizing Static Analysis and Code Generation to Accelerate Neural Networks

no code implementations27 Jun 2012 Lawrence McAfee, Kunle Olukotun

In this paper, we present SONNC, a compiler for NNs that utilizes static analysis to generate optimized parallel code.

C++ code Code Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.