Search Results for author: DiJia Su

Found 5 papers, 2 papers with code

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

1 code implementation • 21 Feb 2024 • Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian

We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than the $A^*$ implementation that was used for training initially.

Decision Making

242

Paper
Code

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

2 code implementations • 18 Jul 2022 • Zihan Ding, DiJia Su, Qinghua Liu, Chi Jin

This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games.

Atari Games Q-Learning

Paper
Code

Narrowing the Coordinate-frame Gap in Behavior Prediction Models: Distillation for Efficient and Accurate Scene-centric Motion Forecasting

no code implementations • 8 Jun 2022 • DiJia Su, Bertrand Douillard, Rami Al-Rfou, Cheolho Park, Benjamin Sapp

These models are intrinsically invariant to translation and rotation between scene elements, are best-performing on public leaderboards, but scale quadratically with the number of agents and scene elements.

Knowledge Distillation Motion Forecasting +2

Paper
Add Code

MURO: Deployment Constrained Reinforcement Learning with Model-based Uncertainty Regularized Batch Optimization

no code implementations • 29 Sep 2021 • DiJia Su, Jason D. Lee, John Mulvey, H. Vincent Poor

In the high support region (low uncertainty), we encourage our policy by taking an aggressive update.

Recommendation Systems reinforcement-learning +2

Paper
Add Code

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

no code implementations • 23 Feb 2021 • DiJia Su, Jason D. Lee, John M. Mulvey, H. Vincent Poor

We consider a setting that lies between pure offline reinforcement learning (RL) and pure online RL called deployment constrained RL in which the number of policy deployments for data sampling is limited.

Reinforcement Learning (RL) Uncertainty Quantification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.