Distributional Reinforcement Learning
31 papers with code • 0 benchmarks • 0 datasets
Value distribution is the distribution of the random return received by a reinforcement learning agent. it been used for a specific purpose such as implementing risk-aware behaviour.
We have random return Z whose expectation is the value Q. This random return is also described by a recursive equation, but one of a distributional nature
Benchmarks
These leaderboards are used to track progress in Distributional Reinforcement Learning
Latest papers
A Distributional Analogue to the Successor Representation
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.
A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning
Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal.
Distributional Bellman Operators over Mean Embeddings
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
Estimation and Inference in Distributional Reinforcement Learning
This implies the distributional policy evaluation problem can be solved with sample efficiency.
Variance Control for Distributional Reinforcement Learning
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting.
Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning
We consider the problem of learning models for risk-sensitive reinforcement learning.
Distributional constrained reinforcement learning for supply chain optimization
We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL.
Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints
In safety-critical robotic tasks, potential failures must be reduced, and multiple constraints must be met, such as avoiding collisions, limiting energy consumption, and maintaining balance.
Risk-Sensitive Policy with Distributional Reinforcement Learning
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome.
Intelligent Resource Allocation in Joint Radar-Communication With Graph Neural Networks
In this paper, we propose a framework for intelligent vehicles to conduct JRC, with minimal prior knowledge of the system model and a tunable performance balance, in an environment where surrounding vehicles execute radar detection periodically, which is typical in contemporary protocols.