Search Results for author: Jonathan Ragan-Kelley

Found 16 papers, 10 papers with code

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

no code implementations7 Feb 2024 Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

In this work, we propose Hydra heads, a sequentially dependent, drop-in replacement for standard draft heads that significantly improves speculation accuracy.

How to guess a gradient

no code implementations7 Dec 2023 Utkarsh Singhal, Brian Cheung, Kartik Chandra, Jonathan Ragan-Kelley, Joshua B. Tenenbaum, Tomaso A. Poggio, Stella X. Yu

We study how to narrow the gap in optimization performance between methods that calculate exact gradients and those that use directional derivatives.

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation15 Nov 2023 William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

no code implementations7 Oct 2023 Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference.

In-Context Learning

Differentiating Metropolis-Hastings to Optimize Intractable Densities

1 code implementation13 Jun 2023 Gaurav Arya, Ruben Seyer, Frank Schäfer, Kartik Chandra, Alexander K. Lew, Mathieu Huot, Vikash K. Mansinghka, Jonathan Ragan-Kelley, Christopher Rackauckas, Moritz Schauer

We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it.

Designing Perceptual Puzzles by Differentiating Probabilistic Programs

no code implementations26 Apr 2022 Kartik Chandra, Tzu-Mao Li, Joshua Tenenbaum, Jonathan Ragan-Kelley

We design new visual illusions by finding "adversarial examples" for principled models of human perception -- specifically, for probabilistic models, which treat vision as Bayesian inference.

Color Constancy Probabilistic Programming

Differentiable Vector Graphics Rasterization for Editing and Learning

1 code implementation ACM Transactions on Graphics 2020 Tzu-Mao Li, Michal Lukáč, Michaël Gharbi, Jonathan Ragan-Kelley

We introduce a differentiable rasterizer that bridges the vector graphics and raster image domains, enabling powerful raster-based loss functions, optimization procedures, and machine learning techniques to edit and generate vector content.

Vector Graphics

DiffTaichi: Differentiable Programming for Physical Simulation

2 code implementations ICLR 2020 Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Frédo Durand

We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators.

Physical Simulations

Gradient Descent: The Ultimate Optimizer

2 code implementations29 Sep 2019 Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer

This allows us to easily apply the method to other optimizers and hyperparameters (e. g. momentum coefficients).

BIG-bench Machine Learning Hyperparameter Optimization

Programming Heterogeneous Systems from an Image Processing DSL

3 code implementations28 Oct 2016 Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, Mark Horowitz

We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware.

Software Engineering

A Systematic Approach to Blocking Convolutional Neural Networks

1 code implementation14 Jun 2016 Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, Mark Horowitz

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations.

Blocking

Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

no code implementations22 Apr 2016 Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, Matthias Nießner

Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes.

Cannot find the paper you are looking for? You can Submit a new open access paper.