7 code implementations • CVPR 2021 • Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens
Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.
Ranked #212 on Image Classification on ImageNet
2 code implementations • ICML 2020 • William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.
no code implementations • ICML 2020 • Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith
Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias.
8 code implementations • NeurIPS 2019 • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens
The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.
no code implementations • ICLR 2019 • Prajit Ramachandran, Quoc V. Le
Both architectural diversity and routing depth can increase the representational power of a routing network.
Ranked #2 on Multi-Task Learning on OMNIGLOT
no code implementations • 8 Aug 2018 • Maximilian Alber, Irwan Bello, Barret Zoph, Pieter-Jan Kindermans, Prajit Ramachandran, Quoc Le
The back-propagation algorithm is the cornerstone of deep learning.
21 code implementations • ICLR 2018 • Prajit Ramachandran, Barret Zoph, Quoc V. Le
The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
1 code implementation • 20 Apr 2017 • Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang
In this work, we describe a method to speed up generation in convolutional autoregressive models.
no code implementations • 7 Apr 2017 • Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng
Policy gradient methods have been successfully applied to many complex reinforcement learning problems.
6 code implementations • 29 Nov 2016 • Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang
This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet.
no code implementations • EMNLP 2017 • Prajit Ramachandran, Peter J. Liu, Quoc V. Le
We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.
1 code implementation • 26 Feb 2016 • Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.