Search Results for author: Jimmy Ba

Found 66 papers, 40 papers with code

Improving Transformer Optimization Through Better Initialization

1 code implementation • ICML 2020 • Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Language Modelling Machine Translation +1

Paper
Code

Improving Transformer Optimization Through Better Initialization

1 code implementation • ICML 2020 • Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Language Modelling Machine Translation +1

Paper
Code

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

no code implementations • 5 Mar 2024 • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Sam Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks

To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs.

Multiple-choice

Paper
Add Code

Using Large Language Models for Hyperparameter Optimization

no code implementations • 7 Dec 2023 • Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO).

Bayesian Optimization Decision Making +1

Paper
Add Code

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

2 code implementations • 10 Oct 2023 • Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba

We hope that our dataset, openly released on the Hugging Face Hub, will help spur advances in the reasoning abilities of large language models.

555

Paper
Code

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation • 25 Sep 2023 • Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Paper
Code

STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

no code implementations • NeurIPS 2023 • Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila Mcilraith

Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks.

Decision Making Image Generation +1

Paper
Add Code

Training on Thin Air: Improve Image Classification with Generated Data

1 code implementation • 24 May 2023 • Yongchao Zhou, Hshmat Sahak, Jimmy Ba

In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification.

Data Augmentation Few-Shot Learning +2

Paper
Code

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

2 code implementations • NeurIPS 2023 • Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003.

Instruction Following

1,097

Paper
Code

Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

1 code implementation • 19 May 2023 • Augustin Toma, Patrick R. Lawler, Jimmy Ba, Rahul G. Krishnan, Barry B. Rubin, Bo wang

We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research.

Language Modelling Large Language Model

Paper
Code

Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

1 code implementation • 6 May 2023 • Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi

In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning.

Paper
Code

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

2 code implementations • 26 Apr 2023 • Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem

We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models.

Ranked #31 on Text-to-Image Generation on MS COCO

Text-to-Image Generation

Paper
Code

Boosted Prompt Ensembles for Large Language Models

1 code implementation • 12 Apr 2023 • Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training.

GSM8K Language Modelling

Paper
Code

Mastering Diverse Domains through World Models

7 code implementations • 10 Jan 2023 • Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence.

Ranked #6 on Atari Games 100k on Atari 100k

Atari Games 100k Decision Making +2

2,539

Paper
Code

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

no code implementations • 7 Dec 2022 • Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.

Paper
Add Code

Large Language Models Are Human-Level Prompt Engineers

2 code implementations • 3 Nov 2022 • Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

989

Paper
Code

Exploring Low Rank Training of Deep Neural Networks

no code implementations • 27 Sep 2022 • Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez

Training deep neural networks in low rank, i. e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time.

Paper
Add Code

Dataset Distillation using Neural Feature Regression

2 code implementations • 1 Jun 2022 • Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba

Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data.

Continual Learning Image Classification +2

1,158

Paper
Code

You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

no code implementations • 31 May 2022 • Keiran Paster, Sheila Mcilraith, Jimmy Ba

In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns.

Offline RL Playing the Game of 2048

Paper
Add Code

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Paper
Add Code

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

1 code implementation • NeurIPS 2021 • Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jimmy Ba

These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents.

Domain Generalization Reinforcement Learning (RL)

Paper
Code

Understanding the Variance Collapse of SVGD in High Dimensions

no code implementations • ICLR 2022 • Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang

Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.

Computational Efficiency Vocal Bursts Intensity Prediction

Paper
Add Code

Clockwork Variational Autoencoders

2 code implementations • NeurIPS 2021 • Vaibhav Saxena, Jimmy Ba, Danijar Hafner

We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals.

Video Prediction

Paper
Code

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

1 code implementation • 15 Jan 2021 • Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

Inductive Bias Mathematical Reasoning

Paper
Code

Video Prediction with Variational Temporal Hierarchies

no code implementations • 1 Jan 2021 • Vaibhav Saxena, Jimmy Ba, Danijar Hafner

Deep learning has shown promise for accurately predicting high-dimensional video sequences.

Video Prediction

Paper
Add Code

How Does a Neural Network's Architecture Impact Its Robustness to Noisy Labels?

no code implementations • NeurIPS 2021 • Jingling Li, Mozhi Zhang, Keyulu Xu, John P. Dickerson, Jimmy Ba

Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels.

Learning with noisy labels

Paper
Add Code

Evaluating Agents without Rewards

1 code implementation • 21 Dec 2020 • Brendon Matusch, Jimmy Ba, Danijar Hafner

Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

Atari Games

Paper
Code

Planning from Pixels using Inverse Dynamics Models

no code implementations • ICLR 2021 • Keiran Paster, Sheila A. McIlraith, Jimmy Ba

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents.

Paper
Add Code

Mastering Atari with Discrete World Models

9 code implementations • ICLR 2021 • Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

The world model uses discrete representations and is trained separately from the policy.

Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)

Atari Games

2,539

Paper
Code

Action and Perception as Divergence Minimization

1 code implementation • 3 Sep 2020 • Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences.

Decision Making Representation Learning

Paper
Code

A Study of Gradient Variance in Deep Learning

1 code implementation • 9 Jul 2020 • Fartash Faghri, David Duvenaud, David J. Fleet, Jimmy Ba

We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.

Clustering

Paper
Code

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

3 code implementations • 8 Jul 2020 • Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

Zero-shot Generalization

Paper
Code

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

1 code implementation • ICLR 2021 • Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

Automated Theorem Proving

Paper
Code

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

2 code implementations • ICML 2020 • Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks?

Multi-Goal Reinforcement Learning reinforcement-learning +1

103

Paper
Code

When Does Preconditioning Help or Hurt Generalization?

no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Paper
Add Code

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations • ICLR 2020 • Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias Vocal Bursts Valence Prediction

Paper
Add Code

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

5 code implementations • ICLR 2020 • Yeming Wen, Dustin Tran, Jimmy Ba

We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs.

664

Paper
Code

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

2 code implementations • ICLR 2020 • Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias.

Inductive Bias Metric Learning +3

Paper
Code

Dream to Control: Learning Behaviors by Latent Imagination

20 code implementations • ICLR 2020 • Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

Learned world models summarize an agent's experience to facilitate learning complex behaviors.

Continuous Control reinforcement-learning +1

31,072

Paper
Code

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations • pproximateinference AABI Symposium 2019 • Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

LEMMA

Paper
Add Code

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

no code implementations • ICLR 2020 • Yuanhao Wang, Guodong Zhang, Jimmy Ba

Many tasks in modern machine learning can be formulated as finding equilibria in \emph{sequential} games.

Paper
Add Code

A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed

no code implementations • 25 Sep 2019 • Qingru Zhang, Yuhuai Wu, Fartash Faghri, Tianzong Zhang, Jimmy Ba

In this paper, we present a non-asymptotic analysis of SVRG under a noisy least squares regression problem.

Computational Efficiency regression +1

Paper
Add Code

Lookahead Optimizer: k steps forward, 1 step back

19 code implementations • NeurIPS 2019 • Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

Ranked #1 on Stochastic Optimization on ImageNet ResNet-50 - 60 Epochs

Image Classification Machine Translation +2

29,758

Paper
Code

Benchmarking Model-Based Reinforcement Learning

2 code implementations • 3 Jul 2019 • Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.

Benchmarking Model-based Reinforcement Learning +3

Paper
Code

Exploring Model-based Planning with Policy Networks

1 code implementation • ICLR 2020 • Tingwu Wang, Jimmy Ba

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance.

Benchmarking Model-based Reinforcement Learning +1

Paper
Code

Neural Graph Evolution: Towards Efficient Automatic Robot Design

1 code implementation • 12 Jun 2019 • Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Paper
Code

Graph Normalizing Flows

1 code implementation • NeurIPS 2019 • Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.

Paper
Code

ACTRCE: Augmenting Experience via Teacher’s Advice

no code implementations • ICLR 2019 • Yuhuai Wu, Harris Chan, Jamie Kiros, Sanja Fidler, Jimmy Ba

Sparse reward is one of the most challenging problems in reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Neural Graph Evolution: Automatic Robot Design

no code implementations • ICLR 2019 • Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Paper
Add Code

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

no code implementations • 21 Feb 2019 • Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients.

Stochastic Optimization

Paper
Add Code

DOM-Q-NET: Grounded RL on Structured Language

1 code implementation • ICLR 2019 • Sheng Jia, Jamie Kiros, Jimmy Ba

Building agents to interact with the web would allow for significant improvements in knowledge understanding and representation learning.

Reinforcement Learning (RL) Representation Learning

Paper
Code

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations • 12 Feb 2019 • Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Reversible Recurrent Neural Networks

1 code implementation • NeurIPS 2018 • Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

Paper
Code

Exploring Curvature Noise in Large-Batch Stochastic Optimization

no code implementations • 27 Sep 2018 • Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

Unfortunately, a major drawback is the so-called generalization gap: large-batch training typically leads to a degradation in generalization performance of the model as compared to small-batch training.

Stochastic Optimization

Paper
Add Code

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

3 code implementations • ICLR 2018 • Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

451

Paper
Code

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

no code implementations • NeurIPS 2018 • Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason D. Lee

A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions.

Paper
Add Code

NerveNet: Learning Structured Policy with Graph Neural Networks

1 code implementation • ICLR 2018 • Tingwu Wang, Renjie Liao, Jimmy Ba, Sanja Fidler

We address the problem of learning structured policies for continuous control.

Benchmarking Continuous Control +3

108

Paper
Code

Kronecker-factored Curvature Approximations for Recurrent Neural Networks

no code implementations • ICLR 2018 • James Martens, Jimmy Ba, Matt Johnson

Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017).

Paper
Add Code

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

8 code implementations • NeurIPS 2017 • Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

Atari Games Continuous Control +2

15,334

Paper
Code

Using Fast Weights to Attend to the Recent Past

3 code implementations • NeurIPS 2016 • Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

262

Paper
Code

Learning Wake-Sleep Recurrent Attention Models

no code implementations • NeurIPS 2015 • Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

Caption Generation Computational Efficiency +2

Paper
Add Code

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

no code implementations • ICCV 2015 • Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.

Zero-Shot Learning

Paper
Add Code

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

88 code implementations • 10 Feb 2015 • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

2,654

Paper
Code

Multiple Object Recognition with Visual Attention

5 code implementations • 24 Dec 2014 • Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu

We present an attention-based model for recognizing multiple objects in images.

Object Object Recognition +2

Paper
Code

Adam: A Method for Stochastic Optimization

82 code implementations • 22 Dec 2014 • Diederik P. Kingma, Jimmy Ba

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Stochastic Optimization

47,992

Paper
Code

Adaptive dropout for training deep neural networks

no code implementations • NeurIPS 2013 • Jimmy Ba, Brendan Frey

For example, our model achieves 5. 8% error on the NORB test set, which is better than state-of-the-art results obtained using convolutional architectures. "

Denoising

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.