Q-Learning
388 papers with code • 0 benchmarks • 2 datasets
The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
( Image credit: Playing Atari with Deep Reinforcement Learning )
Benchmarks
These leaderboards are used to track progress in Q-Learning
Libraries
Use these libraries to find Q-Learning models and implementationsLatest papers with no code
Unified ODE Analysis of Smooth Q-Learning Algorithms
This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments.
Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty
Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory.
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm.
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
We evaluate our tracker on several high-fidelity environments with challenging situations, such as distraction and occlusion.
Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement
To the best of our knowledge, this study represents a pioneering effort in using Reinforcement Learning to address the aforementioned problem, offering promising perspectives in fire prevention and landscape management
Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA
Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize.
Traffic Signal Control and Speed Offset Coordination Using Q-Learning for Arterial Road Networks
We evaluate the performance of the proposed arterial traffic control strategy using microscopic traffic simulations of an arterial corridor with seven intersections near the I-710 freeway.
Deep Reinforcement Learning Control for Disturbance Rejection in a Nonlinear Dynamic System with Parametric Uncertainty
This work describes a technique for active rejection of multiple independent and time-correlated stochastic disturbances for a nonlinear flexible inverted pendulum with cart system with uncertain model parameters.
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks.
Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning
Accounting for the uncertainty of value functions boosts exploration in Reinforcement Learning (RL).