Fast MNAS: Uncertainty-aware Neural Architecture Search with Lifelong Learning
Anonymous
|
2021-01-01
|
Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach
Anonymous
|
2021-01-01
|
PGPS : Coupling Policy Gradient with Population-based Search
Anonymous
|
2021-01-01
|
On Proximal Policy Optimization's Heavy-Tailed Gradients
Anonymous
|
2021-01-01
|
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms
Anonymous
|
2021-01-01
|
A Strong On-Policy Competitor To PPO
Anonymous
|
2021-01-01
|
Deep Coherent Exploration For Continuous Control
Anonymous
|
2021-01-01
|
Grounded Compositional Generalization with Environment Interactions
Anonymous
|
2021-01-01
|
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Anonymous
|
2021-01-01
|
Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning
Anonymous
|
2021-01-01
|
No MCMC for me: Amortized sampling for fast and stable training of energy-based models
Anonymous
|
2021-01-01
|
Policy Optimization in Zero-Sum Markov Games: Fictitious Self-Play Provably Attains Nash Equilibria
Anonymous
|
2021-01-01
|
Warpspeed Computation of Optimal Transport, Graph Distances, and Embedding Alignment
Anonymous
|
2021-01-01
|
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup
Han Shen
•
Kaiqing Zhang
•
Mingyi Hong
•
Tianyi Chen
|
2020-12-31
|
A liquid scintillator for a neutrino Detector working at -50 degree
Zhangquan Xie
•
Jun Cao
•
Yayun Ding
•
Mengchao Liu
•
Xilei Sun
•
Wei Wang
•
Yuguang Xie
|
2020-12-22
|
Policy Gradient RL Algorithms as Directed Acyclic Graphs
Juan Jose Garau Luis
|
2020-12-14
|
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
•
Yin Cui
•
Aravind Srinivas
•
Rui Qian
•
Tsung-Yi Lin
•
Ekin D. Cubuk
•
Quoc V. Le
•
Barret Zoph
|
2020-12-13
|
Proximal Policy Optimization Smoothed Algorithm
Wangshu Zhu
•
Andre Rosendo
|
2020-12-04
|
Domain Generalization via Entropy Regularization
Shanshan Zhao
•
Mingming Gong
•
Tongliang Liu
•
Huan Fu
•
DaCheng Tao
|
2020-12-01
|
Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method
Qi Zhou
•
Yufei Kuang
•
Zherui Qiu
•
Houqiang Li
•
Jie Wang
|
2020-12-01
|
On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
Elena Smirnova
•
Elvis Dohmatob
|
2020-12-01
|
Enhanced Scene Specificity with Sparse Dynamic Value Estimation
Jaskirat Singh
•
Liang Zheng
|
2020-11-25
|
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
Xiao-Yang Liu
•
Hongyang Yang
•
Qian Chen
•
Runjia Zhang
•
Liuqing Yang
•
Bowen Xiao
•
Christina Dan Wang
|
2020-11-19
|
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Christian Schroeder de Witt
•
Tarun Gupta
•
Denys Makoviichuk
•
Viktor Makoviychuk
•
Philip H. S. Torr
•
Mingfei Sun
•
Shimon Whiteson
|
2020-11-18
|
Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
|
Fabio Pardo
|
2020-11-15
|
Proximal Policy Optimization via Enhanced Exploration Efficiency
Junwei Zhang
•
Zhenghao Zhang
•
Shuai Han
•
Shuai Lü
|
2020-11-11
|
Drafting in Collectible Card Games via Reinforcement Learning
|
Ronaldo Vieira
•
Anderson Rocha Tavares
•
Luiz Chaimowicz
|
2020-11-07
|
Proximal Policy Gradient: PPO with Policy Gradient
Ju-Seung Byun
•
Byungmoon Kim
•
Huamin Wang
|
2020-10-20
|
Recurrent Distributed Reinforcement Learning for Partially Observable Robotic Assembly
Jieliang Luo
•
Hui Li
|
2020-10-15
|
Discrete Latent Space World Models for Reinforcement Learning
Jan Robine
•
Tobias Uelwer
•
Stefan Harmeling
|
2020-10-12
|
Automated Concatenation of Embeddings for Structured Prediction
Xinyu Wang
•
Yong Jiang
•
Nguyen Bach
•
Tao Wang
•
Zhongqiang Huang
•
Fei Huang
•
Kewei Tu
|
2020-10-10
|
No MCMC for me: Amortized sampling for fast and stable training of energy-based models
Will Grathwohl
•
Jacob Kelly
•
Milad Hashemi
•
Mohammad Norouzi
•
Kevin Swersky
•
David Duvenaud
|
2020-10-08
|
Proximal Policy Optimization with Relative Pearson Divergence
Taisuke Kobayashi
|
2020-10-07
|
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
|
Minki Kang
•
Moonsu Han
•
Sung Ju Hwang
|
2020-10-06
|
Entropy Regularization for Mean Field Games with Learning
Xin Guo
•
Renyuan Xu
•
Thaleia Zariphopoulou
|
2020-09-30
|
Revisiting Design Choices in Proximal Policy Optimization
|
Chloe Ching-Yun Hsu
•
Celestine Mendler-Dünner
•
Moritz Hardt
|
2020-09-23
|
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup Lee
•
Yeongjae Cheon
•
Wook-Shin Han
|
2020-09-21
|
Phasic Policy Gradient
|
Karl Cobbe
•
Jacob Hilton
•
Oleg Klimov
•
John Schulman
|
2020-09-09
|
Data-Driven Transferred Energy Management Strategy for Hybrid Electric Vehicles via Deep Reinforcement Learning
Jiangdong Liao
•
Teng Liu
•
Wenhao Tan
•
Shaobo Lu
•
Yalian Yang
|
2020-09-07
|
DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control
|
Pengyuan Zhou
•
Xianfu Chen
•
Zhi Liu
•
Tristan Braud
•
Pan Hui
•
Jussi Kangasharju
|
2020-09-03
|
Dynamic Scheduling for Stochastic Edge-Cloud Computing Environments using A3C learning and Residual Recurrent Neural Networks
Shreshth Tuli
•
Shashikant Ilager
•
Kotagiri Ramamohanarao
•
Rajkumar Buyya
|
2020-09-01
|
On the model-based stochastic value gradient for continuous reinforcement learning
Brandon Amos
•
Samuel Stanton
•
Denis Yarats
•
Andrew Gordon Wilson
|
2020-08-28
|
Cross-regional oil palm tree counting and detection via multi-level attention domain adaptation network
|
Juepeng Zheng
•
Haohuan Fu
•
Weijia Li
•
Wenzhao Wu
•
Yi Zhao
•
Runmin Dong
•
Le Yu
|
2020-08-26
|
Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning
Wenshuai Zhao
•
Jorge Peña Queralta
•
Li Qingqing
•
Tomi Westerlund
|
2020-08-18
|
Queueing Network Controls via Deep Reinforcement Learning
J. G. Dai
•
Mark Gluzman
|
2020-07-31
|
Lagrangian Duality in Reinforcement Learning
Pranay Pasula
|
2020-07-20
|
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Shicong Cen
•
Chen Cheng
•
Yuxin Chen
•
Yuting Wei
•
Yuejie Chi
|
2020-07-13
|
Maximum Entropy Regularization and Chinese Text Recognition
Changxu Cheng
•
Wuheng Xu
•
Xiang Bai
•
Bin Feng
•
Wenyu Liu
|
2020-07-09
|
Learning Implicit Credit Assignment for Multi-Agent Actor-Critic
Meng Zhou
•
Ziyu Liu
•
Pengwei Sui
•
Yixuan Li
•
Yuk Ying Chung
|
2020-07-06
|
Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
|
Aleksei Petrenko
•
Zhehui Huang
•
Tushar Kumar
•
Gaurav Sukhatme
•
Vladlen Koltun
|
2020-06-21
|
An operator view of policy gradient methods
Dibya Ghosh
•
Marlos C. Machado
•
Nicolas Le Roux
|
2020-06-19
|
Fine-Tuning DARTS for Image Classification
Muhammad Suhaib Tanveer
•
Muhammad Umar Karim Khan
•
Chong-Min Kyung
|
2020-06-16
|
Optimistic Distributionally Robust Policy Optimization
Jun Song
•
Chaoyue Zhao
|
2020-06-14
|
Exploration by Maximizing Rényi Entropy for Zero-Shot Meta RL
Chuheng Zhang
•
Yuanying Cai
•
Longbo Huang
•
Jian Li
|
2020-06-11
|
Rethinking Pre-training and Self-training
|
Barret Zoph
•
Golnaz Ghiasi
•
Tsung-Yi Lin
•
Yin Cui
•
Hanxiao Liu
•
Ekin D. Cubuk
•
Quoc V. Le
|
2020-06-11
|
A Comparison of Self-Play Algorithms Under a Generalized Framework
Daniel Hernandez
•
Kevin Denamganai
•
Sam Devlin
•
Spyridon Samothrakis
•
James Alfred Walker
|
2020-06-08
|
Optimization and passive flow control using single-step deep reinforcement learning
H. Ghraieb
•
J. Viquerat
•
A. Larcher
•
P. Meliga
•
E. Hachem
|
2020-06-04
|
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Seungyul Han
•
Youngchul Sung
|
2020-06-02
|
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning
Jaskirat Singh
•
Liang Zheng
|
2020-05-25
|
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
•
Andrew Ilyas
•
Shibani Santurkar
•
Dimitris Tsipras
•
Firdaus Janoos
•
Larry Rudolph
•
Aleksander Madry
|
2020-05-25
|
Mirror Descent Policy Optimization
Manan Tomar
•
Lior Shani
•
Yonathan Efroni
•
Mohammad Ghavamzadeh
|
2020-05-20
|
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
•
Chenjun Xiao
•
Csaba Szepesvari
•
Dale Schuurmans
|
2020-05-13
|
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
|
Antonin Raffin
•
Freek Stulp
|
2020-05-12
|
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing
Clara Meister
•
Elizabeth Salesky
•
Ryan Cotterell
|
2020-05-02
|
Model-based reinforcement learning for biological sequence design
Christof Angermueller
•
David Dohan
•
David Belanger
•
Ramya Deshpande
•
Kevin Murphy
•
Lucy Colwell
|
2020-05-01
|
Look at the First Sentence: Position Bias in Question Answering
Miyoung Ko
•
Jinhyuk Lee
•
Hyunjae Kim
•
Gangwoo Kim
•
Jaewoo Kang
|
2020-04-30
|
Robust active flow control over a range of Reynolds numbers using an artificial neural network trained through deep reinforcement learning
Hongwei Tang
•
Jean Rabault
•
Alexander Kuhnle
•
Yan Wang
•
Tongguang Wang
|
2020-04-26
|
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong Zhang
•
Bo Liu
•
Shimon Whiteson
|
2020-04-22
|
Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms
|
Federico A. Galatolo
•
Mario G. C. A. Cimino
•
Gigliola Vaglini
|
2020-04-08
|
Guided Dialog Policy Learning without Adversarial Learning in the Loop
Ziming Li
•
Sungjin Lee
•
Baolin Peng
•
Jinchao Li
•
Shahin Shayandeh
•
Jianfeng Gao
|
2020-04-07
|
Evolving Normalization-Activation Layers
|
Hanxiao Liu
•
Andrew Brock
•
Karen Simonyan
•
Quoc V. Le
|
2020-04-06
|
Leverage the Average: an Analysis of Regularization in RL
Nino Vieillard
•
Tadashi Kozuno
•
Bruno Scherrer
•
Olivier Pietquin
•
Rémi Munos
•
Matthieu Geist
|
2020-03-31
|
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
|
Yuan Gao
•
Haoping Bai
•
Zequn Jie
•
Jiayi Ma
•
Kui Jia
•
Wei Liu
|
2020-03-31
|
Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
Daniel Zhang
•
Colleen P. Bailey
|
2020-03-28
|
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
|
Huan Zhang
•
Hongge Chen
•
Chaowei Xiao
•
Bo Li
•
Mingyan Liu
•
Duane Boning
•
Cho-Jui Hsieh
|
2020-03-19
|
Adaptive Discretization for Continuous Control using Particle Filtering Policy Network
|
Pei Xu
•
Ioannis Karamouzas
|
2020-03-16
|
Explore and Exploit with Heterotic Line Bundle Models
Magdalena Larfors
•
Robin Schneider
|
2020-03-10
|
Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors
|
Rituraj Kaushik
•
Timothée Anne
•
Jean-Baptiste Mouret
|
2020-03-10
|
Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks
Xingyu Sha
•
Jiaqi Zhang
•
Kaiqing Zhang
•
Keyou You
•
Tamer Başar
|
2020-03-01
|
A Self-Tuning Actor-Critic Algorithm
Tom Zahavy
•
Zhongwen Xu
•
Vivek Veeriah
•
Matteo Hessel
•
Junhyuk Oh
•
Hado van Hasselt
•
David Silver
•
Satinder Singh
|
2020-02-28
|
A Visual Communication Map for Multi-Agent Deep Reinforcement Learning
Ngoc Duy Nguyen
•
Thanh Thi Nguyen
•
Saeid Nahavandi
|
2020-02-27
|
Generalized Product Quantization Network for Semi-supervised Image Retrieval
|
Young Kyun Jang
•
Nam Ik Cho
|
2020-02-26
|
Reinforcement Learning Framework for Deep Brain Stimulation Study
|
Dmitrii Krylov
•
Remi Tachet
•
Romain Laroche
•
Michael Rosenblum
•
Dmitry V. Dylov
|
2020-02-22
|
Deep RL Agent for a Real-Time Action Strategy Game
|
Michal Warchalski
•
Dimitrije Radojevic
•
Milos Milosevic
|
2020-02-15
|
Temporal-adaptive Hierarchical Reinforcement Learning
Wen-Ji Zhou
•
Yang Yu
|
2020-02-06
|
Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation
Siqi Yang
•
Lin Wu
•
Arnold Wiliem
•
Brian C. Lovell
|
2020-02-03
|
Brain Metastasis Segmentation Network Trained with Robustness to Annotations with Multiple False Negatives
Darvin Yi
•
Endre Grøvik
•
Michael Iv
•
Elizabeth Tong
•
Greg Zaharchuk
•
Daniel Rubin
|
2020-01-26
|
Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO
Mario S. Holubar
•
Marco A. Wiering
|
2020-01-15
|
Intelligent Roundabout Insertion using Deep Reinforcement Learning
|
Alessandro Paolo Capasso
•
Giulio Bacchiani
•
Daniele Molinari
|
2020-01-03
|
Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning
Anonymous
|
2020-01-01
|
Learning Representations in Reinforcement Learning: an Information Bottleneck Approach
Yingjun Pei
•
Xinwen Hou
|
2020-01-01
|
TPO: TREE SEARCH POLICY OPTIMIZATION FOR CONTINUOUS ACTION SPACES
Amir Yazdanbakhsh
•
Ebrahim Songhori
•
Robert Ormandi
•
Anna Goldie
•
Azalia Mirhoseini
|
2020-01-01
|
Model-based reinforcement learning for biological sequence design
Anonymous
|
2020-01-01
|
Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search
Anonymous
|
2020-01-01
|
Implementation Matters in Deep RL: A Case Study on PPO and TRPO
|
Anonymous
|
2020-01-01
|
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning
|
Keng Wah Loon
•
Laura Graesser
•
Milan Cvitkovic
|
2019-12-28
|
Soft Q-network
Jingbin Liu
•
Xinyang Gu
•
Shuai Liu
•
Dexiang Zhang
|
2019-12-20
|
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning
Deheng Ye
•
Zhao Liu
•
Mingfei Sun
•
Bei Shi
•
Peilin Zhao
•
Hao Wu
•
Hongsheng Yu
•
Shaojie Yang
•
Xipeng Wu
•
Qingwei Guo
•
Qiaobo Chen
•
Yinyuting Yin
•
Hao Zhang
•
Tengfei Shi
•
Liang Wang
•
Qiang Fu
•
Wei Yang
•
Lanxiao Huang
|
2019-12-20
|
Marginalized State Distribution Entropy Regularization in Policy Optimization
Riashat Islam
•
Zafarali Ahmed
•
Doina Precup
|
2019-12-11
|
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
Riashat Islam
•
Raihan Seraj
•
Pierre-Luc Bacon
•
Doina Precup
|
2019-12-11
|
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
|
Xianzhi Du
•
Tsung-Yi Lin
•
Pengchong Jin
•
Golnaz Ghiasi
•
Mingxing Tan
•
Yin Cui
•
Quoc V. Le
•
Xiaodan Song
|
2019-12-10
|
Intelligent Coordination among Multiple Traffic Intersections Using Multi-Agent Reinforcement Learning
Ujwal Padam Tewari
•
Vishal Bidawatka
•
Varsha Raveendran
•
Vinay Sudhakaran
•
Shreedhar Kodate Shreeshail
•
Jayanth Prakash Kulkarni
|
2019-12-09
|
MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices
|
Bo Chen
•
Golnaz Ghiasi
•
Hanxiao Liu
•
Tsung-Yi Lin
•
Dmitry Kalenichenko
•
Hartwig Adams
•
Quoc V. Le
|
2019-12-02
|
On-policy Reinforcement Learning with Entropy Regularization
Jingbin Liu
•
Xinyang Gu
•
Dexiang Zhang
•
Shuai Liu
|
2019-12-02
|
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy
Boyi Liu
•
Qi Cai
•
Zhuoran Yang
•
Zhaoran Wang
|
2019-12-01
|
Automated curriculum generation for Policy Gradients from Demonstrations
Anirudh Srinivasan
•
Dzmitry Bahdanau
•
Maxime Chevalier-Boisvert
•
Yoshua Bengio
|
2019-12-01
|
Adversary A3C for Robust Reinforcement Learning
Zhaoyuan Gu
•
Zhenzhong Jia
•
Howie Choset
|
2019-12-01
|
Learning Reward Machines for Partially Observable Reinforcement Learning
Rodrigo Toro Icarte
•
Ethan Waldie
•
Toryn Klassen
•
Rick Valenzano
•
Margarita Castro
•
Sheila Mcilraith
|
2019-12-01
|
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael Luo
•
Jiahao Yao
•
Richard Liaw
•
Eric Liang
•
Ion Stoica
|
2019-11-30
|
Accelerating Training in Pommerman with Imitation and Reinforcement Learning
Hardik Meisheri
•
Omkar Shelke
•
Richa Verma
•
Harshad Khadilkar
|
2019-11-12
|
Learning Representations in Reinforcement Learning:An Information Bottleneck Approach
Pei Yingjun
•
Hou Xinwen
|
2019-11-12
|
Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning
Kyoichiro Kobayashi
•
Takato Horii
•
Ryo Iwaki
•
Yukie Nagai
•
Minoru Asada
|
2019-11-01
|
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
|
Chengshu Li
•
Fei Xia
•
Roberto Martin-Martin
•
Silvio Savarese
|
2019-10-24
|
Regularization Matters in Policy Optimization
|
Zhuang Liu
•
Xuanlin Li
•
Bingyi Kang
•
Trevor Darrell
|
2019-10-21
|
Prescribed Generative Adversarial Networks
|
Adji B. Dieng
•
Francisco J. R. Ruiz
•
David M. Blei
•
Michalis K. Titsias
|
2019-10-09
|
TorchBeast: A PyTorch Platform for Distributed RL
|
Heinrich Küttler
•
Nantas Nardelli
•
Thibaut Lavril
•
Marco Selvatici
•
Viswanath Sivakumar
•
Tim Rocktäschel
•
Edward Grefenstette
|
2019-10-08
|
Randomized Shortest Paths with Net Flows and Capacity Constraints
Sylvain Courtain
•
Pierre Leleux
•
Ilkka Kivimaki
•
Guillaume Guex
•
Marco Saerens
|
2019-10-04
|
Quantized Reinforcement Learning (QUARL)
|
Srivatsan Krishnan
•
Sharad Chitlangia
•
Maximilian Lam
•
Zishen Wan
•
Aleksandra Faust
•
Vijay Janapa Reddi
|
2019-10-02
|
Forward-Backward Splitting for Optimal Transport based Problems
Guillermo Ortiz-Jimenez
•
Mireille El Gheche
•
Effrosyni Simou
•
Hermina Petric Maretic
•
Pascal Frossard
|
2019-09-20
|
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning
Felix Leibfried
•
Jordi Grau-Moya
|
2019-09-11
|
VUSFA:Variational Universal Successor Features Approximator to Improve Transfer DRL for Target Driven Visual Navigation
|
Shamane Siriwardhana
•
Rivindu Weerasakera
•
Denys J. C. Matthies
•
Suranga Nanayakkara
|
2019-08-18
|
Incremental Reinforcement Learning --- a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods
Tianhao Chen
•
Limei Cheng
•
Yang Liu
•
Wenchuan Jia
•
Shugen Ma
|
2019-08-08
|
DoorGym: A Scalable Door Opening Environment And Baseline Agent
|
Yusuke Urakami
•
Alec Hodgkinson
•
Casey Carlin
•
Randall Leu
•
Luca Rigazio
•
Pieter Abbeel
|
2019-08-05
|
Towards Model-based Reinforcement Learning for Industry-near Environments
Per-Arne Andersen
•
Morten Goodwin
•
Ole-Christoffer Granmo
|
2019-07-27
|
Unsupervised Domain Adaptation via Calibrating Uncertainties
|
Ligong Han
•
Yang Zou
•
Ruijiang Gao
•
Lezi Wang
•
Dimitris Metaxas
|
2019-07-25
|
Google Research Football: A Novel Reinforcement Learning Environment
|
Karol Kurach
•
Anton Raichuk
•
Piotr Stańczyk
•
Michał Zając
•
Olivier Bachem
•
Lasse Espeholt
•
Carlos Riquelme
•
Damien Vincent
•
Marcin Michalski
•
Olivier Bousquet
•
Sylvain Gelly
|
2019-07-25
|
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Bilal Kartal
•
Pablo Hernandez-Leal
•
Matthew E. Taylor
|
2019-07-24
|
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
Pablo Hernandez-Leal
•
Bilal Kartal
•
Matthew E. Taylor
|
2019-07-22
|
PPO Dash: Improving Generalization in Deep Reinforcement Learning
Joe Booth
|
2019-07-15
|
Modified Actor-Critics
Erinc Merdivan
•
Sten Hanke
•
Matthieu Geist
|
2019-07-02
|
End-to-end Deep Reinforcement Learning Based Coreference Resolution
Hongliang Fei
•
Xu Li
•
Dingcheng Li
•
Ping Li
|
2019-07-01
|
Learning Data Augmentation Strategies for Object Detection
|
Barret Zoph
•
Ekin D. Cubuk
•
Golnaz Ghiasi
•
Tsung-Yi Lin
•
Jonathon Shlens
•
Quoc V. Le
|
2019-06-26
|
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Boyi Liu
•
Qi Cai
•
Zhuoran Yang
•
Zhaoran Wang
|
2019-06-25
|
Proximal Distilled Evolutionary Reinforcement Learning
Cristian Bodnar
•
Ben Day
•
Pietro Lió
|
2019-06-24
|
RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies
Vahid Behzadan
•
William Hsu
|
2019-06-03
|
Policy Search by Target Distribution Learning for Continuous Control
|
Chuheng Zhang
•
Yuanqi Li
•
Jian Li
|
2019-05-27
|
Combine PPO with NES to Improve Exploration
Lianjiang Li
•
Yunrong Yang
•
Bingna Li
|
2019-05-23
|
Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment
Jivitesh Sharma
•
Per-Arne Andersen
•
Ole-Chrisoffer Granmo
•
Morten Goodwin
|
2019-05-23
|
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
|
Seungyul Han
•
Youngchul Sung
|
2019-05-07
|
Autonomous Air Traffic Controller: A Deep Multi-Agent Reinforcement Learning Approach
Marc Brittain
•
Peng Wei
|
2019-05-02
|
Soft Q-Learning with Mutual-Information Regularization
Jordi Grau-Moya
•
Felix Leibfried
•
Peter Vrancx
|
2019-05-01
|
SUPERVISED POLICY UPDATE
|
Quan Vuong
•
Yiming Zhang
•
Keith W. Ross
|
2019-05-01
|
Towards Combining On-Off-Policy Methods for Real-World Applications
Kai-Chun Hu
•
Chen-Huan Pi
•
Ting Han Wei
•
I-Chen Wu
•
Stone Cheng
•
Yi-Wei Dai
•
Wei-Yuan Ye
|
2019-04-24
|
Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning
|
Yuji Kanagawa
•
Tomoyuki Kaneko
|
2019-04-17
|
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
|
Wei-cheng Kuo
•
Anelia Angelova
•
Jitendra Malik
•
Tsung-Yi Lin
|
2019-04-05
|
Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Gabriel V. de la Cruz Jr.
•
Yunshu Du
•
Matthew E. Taylor
|
2019-04-03
|
Truly Proximal Policy Optimization
|
Yuhui Wang
•
Hao He
•
Chao Wen
•
Xiaoyang Tan
|
2019-03-19
|
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Denis Steckelmacher
•
Hélène Plisnier
•
Diederik M. Roijers
•
Ann Nowé
|
2019-03-11
|
Trust Region-Guided Proximal Policy Optimization
|
Yuhui Wang
•
Hao He
•
Xiaoyang Tan
•
Yaozhong Gan
|
2019-01-29
|
Combinational Q-Learning for Dou Di Zhu
|
Yang You
•
Liangwei Li
•
Baisong Guo
•
Weiming Wang
•
Cewu Lu
|
2019-01-24
|
Distillation Strategies for Proximal Policy Optimization
Sam Green
•
Craig M. Vineyard
•
Çetin Kaya Koç
|
2019-01-23
|
On-Policy Trust Region Policy Optimisation with Replay Buffers
|
Dmitry Kangin
•
Nicolas Pugeault
|
2019-01-18
|
A Logarithmic Barrier Method For Proximal Policy Optimization
Cheng Zeng
•
Hongming Zhang
|
2018-12-16
|
Exploration versus exploitation in reinforcement learning: a stochastic control approach
Haoran Wang
•
Thaleia Zariphopoulou
•
Xunyu Zhou
|
2018-12-04
|
Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL
Bilal Kartal
•
Pablo Hernandez-Leal
•
Matthew E. Taylor
|
2018-11-30
|
Single-Agent Policy Tree Search With Guarantees
|
Laurent Orseau
•
Levi H. S. Lelis
•
Tor Lattimore
•
Théophane Weber
|
2018-11-27
|
Universal Semi-Supervised Semantic Segmentation
|
Tarun Kalluri
•
Girish Varma
•
Manmohan Chandraker
•
C V Jawahar
|
2018-11-26
|
Policy Optimization with Model-based Explorations
Feiyang Pan
•
Qingpeng Cai
•
An-Xiang Zeng
•
Chun-Xiang Pan
•
Qing Da
•
Hualin He
•
Qing He
•
Pingzhong Tang
|
2018-11-18
|
On the Complexity of Exploration in Goal-Driven Navigation
|
Maruan Al-Shedivat
•
Lisa Lee
•
Ruslan Salakhutdinov
•
Eric Xing
|
2018-11-16
|
Equivalent Constraints for Two-View Geometry: Pose Solution/Pure Rotation Identification and 3D Reconstruction
Qi Cai
•
Yuanxin Wu
•
Lilian Zhang
•
Peike Zhang
|
2018-10-13
|
Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
|
Yogesh Balaji
•
Hamed Hassani
•
Rama Chellappa
•
Soheil Feizi
|
2018-10-09
|
NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm
|
Zhichao Lu
•
Ian Whalen
•
Vishnu Boddeti
•
Yashesh Dhebar
•
Kalyanmoy Deb
•
Erik Goodman
•
Wolfgang Banzhaf
|
2018-10-08
|
PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation
|
Perttu Hämäläinen
•
Amin Babadi
•
Xiaoxiao Ma
•
Jaakko Lehtinen
|
2018-10-05
|
Reinforcement Learning with Perturbed Rewards
Jingkang Wang
•
Yang Liu
•
Bo Li
|
2018-10-02
|
A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters
Lei Yang
•
Jia Li
•
Defeng Sun
•
Kim-Chuan Toh
|
2018-09-12
|
Adversarial Deep Reinforcement Learning in Portfolio Management
|
Zhipeng Liang
•
Hao Chen
•
Junhao Zhu
•
Kangkang Jiang
•
Yanran Li
|
2018-08-29
|
Proximal Policy Optimization and its Dynamic Version for Sequence Generation
Yi-Lin Tuan
•
Jinzhi Zhang
•
Yujia Li
•
Hung-yi Lee
|
2018-08-24
|
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Julian Zimmert
•
Yevgeny Seldin
|
2018-07-19
|
Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding
Tong Chen
•
Wenjia Niu
•
Yingxiao Xiang
•
Xiaoxuan Bai
•
Jiqiang Liu
•
Zhen Han
•
Gang Li
|
2018-07-18
|
Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization
|
Xiangxiang Chu
|
2018-07-02
|
Supervised Policy Update for Deep Reinforcement Learning
|
Quan Vuong
•
Yiming Zhang
•
Keith W. Ross
|
2018-05-29
|
Crawling in Rogue's dungeons with (partitioned) A3C
|
Andrea Asperti
•
Daniele Cortesi
•
Francesco Sovrano
|
2018-04-23
|
An Adaptive Clipping Approach for Proximal Policy Optimization
Gang Chen
•
Yiming Peng
•
Mengjie Zhang
|
2018-04-17
|
A Brandom-ian view of Reinforcement Learning towards strong-AI
Atrisha Sarkar
|
2018-03-07
|
Variational Inference for Policy Gradient
|
Tianbing Xu
|
2018-02-21
|
Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces
Gellért Weisz
•
Paweł Budzianowski
•
Pei-Hao Su
•
Milica Gašić
|
2018-02-11
|
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
|
Lasse Espeholt
•
Hubert Soyer
•
Remi Munos
•
Karen Simonyan
•
Volodymir Mnih
•
Tom Ward
•
Yotam Doron
•
Vlad Firoiu
•
Tim Harley
•
Iain Dunning
•
Shane Legg
•
Koray Kavukcuoglu
|
2018-02-05
|
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Xiaoqin Zhang
•
Huimin Ma
|
2018-01-31
|
An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients
Jiaming Song
•
Yuhuai Wu
|
2018-01-17
|
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design
Daniel Neil
•
Marwin Segler
•
Laura Guasch
•
Mohamed Ahmed
•
Dean Plumbley
•
Matthew Sellwood
•
Nathan Brown
|
2018-01-01
|
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
|
Felipe Petroski Such
•
Vashisht Madhavan
•
Edoardo Conti
•
Joel Lehman
•
Kenneth O. Stanley
•
Jeff Clune
|
2017-12-18
|
Natural Value Approximators: Learning when to Trust Past Estimates
Zhongwen Xu
•
Joseph Modayil
•
Hado P. Van Hasselt
•
Andre Barreto
•
David Silver
•
Tom Schaul
|
2017-12-01
|
Teaching a Machine to Read Maps with Deep Reinforcement Learning
Gino Brunner
•
Oliver Richter
•
Yuyi Wang
•
Roger Wattenhofer
|
2017-11-20
|
AMBER: Adaptive Multi-Batch Experience Replay for Continuous Action Control
Seungyul Han
•
Youngchul Sung
|
2017-10-12
|
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Kyungjae Lee
•
Sungjoon Choi
•
Songhwai Oh
|
2017-09-19
|
Improving Search through A3C Reinforcement Learning based Conversational Agent
Milan Aggarwal
•
Aarushi Arora
•
Shagun Sodhani
•
Balaji Krishnamurthy
|
2017-09-17
|
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
|
Yuhuai Wu
•
Elman Mansimov
•
Shun Liao
•
Roger Grosse
•
Jimmy Ba
|
2017-08-17
|
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
|
Irina Higgins
•
Arka Pal
•
Andrei A. Rusu
•
Loic Matthey
•
Christopher P Burgess
•
Alexander Pritzel
•
Matthew Botvinick
•
Charles Blundell
•
Alexander Lerchner
|
2017-07-26
|
Learning Transferable Architectures for Scalable Image Recognition
|
Barret Zoph
•
Vijay Vasudevan
•
Jonathon Shlens
•
Quoc V. Le
|
2017-07-21
|
Proximal Policy Optimization Algorithms
|
John Schulman
•
Filip Wolski
•
Prafulla Dhariwal
•
Alec Radford
•
Oleg Klimov
|
2017-07-20
|
Noisy Networks for Exploration
|
Meire Fortunato
•
Mohammad Gheshlaghi Azar
•
Bilal Piot
•
Jacob Menick
•
Ian Osband
•
Alex Graves
•
Vlad Mnih
•
Remi Munos
•
Demis Hassabis
•
Olivier Pietquin
•
Charles Blundell
•
Shane Legg
|
2017-06-30
|
Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning
Sahil Sharma
•
Aravind Suresh
•
Rahul Ramesh
•
Balaraman Ravindran
|
2017-05-20
|
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
Nat Dilokthanakul
•
Christos Kaplanis
•
Nick Pawlowski
•
Murray Shanahan
|
2017-05-18
|
Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman
•
Xi Chen
•
Pieter Abbeel
|
2017-04-21
|
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
Audrunas Gruslys
•
Will Dabney
•
Mohammad Gheshlaghi Azar
•
Bilal Piot
•
Marc Bellemare
•
Remi Munos
|
2017-04-15
|
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Yen-Chen Lin
•
Zhang-Wei Hong
•
Yuan-Hong Liao
•
Meng-Li Shih
•
Ming-Yu Liu
•
Min Sun
|
2017-03-08
|
Improving Policy Gradient by Exploring Under-appreciated Rewards
Ofir Nachum
•
Mohammad Norouzi
•
Dale Schuurmans
|
2016-11-28
|
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
|
Mohammad Babaeizadeh
•
Iuri Frosio
•
Stephen Tyree
•
Jason Clemons
•
Jan Kautz
|
2016-11-18
|
Sample Efficient Actor-Critic with Experience Replay
|
Ziyu Wang
•
Victor Bapst
•
Nicolas Heess
•
Volodymyr Mnih
•
Remi Munos
•
Koray Kavukcuoglu
•
Nando de Freitas
|
2016-11-03
|
Asynchronous Methods for Deep Reinforcement Learning
|
Volodymyr Mnih
•
Adrià Puigdomènech Badia
•
Mehdi Mirza
•
Alex Graves
•
Timothy P. Lillicrap
•
Tim Harley
•
David Silver
•
Koray Kavukcuoglu
|
2016-02-04
|