Sarsa Lambda

Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:

$$ \delta_{t} = R_{t+1} + \gamma\hat{q}\left(S_{t+1}, A_{t+1}, \mathbb{w}_{t}\right) - \hat{q}\left(S_{t}, A_{t}, \mathbb{w}_{t}\right) $$

and the action-value form of the eligibility trace:

$$ \mathbb{z}_{-1} = \mathbb{0} $$

$$ \mathbb{z}_{t} = \gamma\lambda\mathbb{z}_{t-1} + \nabla\hat{q}\left(S_{t}, A_{t}, \mathbb{w}_{t} \right), 0 \leq t \leq T$$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Accumulating Eligibility Trace	Eligibility Traces

Categories

Add Remove

On-Policy TD Control

Sarsa Lambda

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove