In the past several years, the field of machine learning has seen an explosion of interest and excitement, with hundreds or thousands of algorithms developed for different tasks every year.
With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks.
We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1.
Given a text description, most existing semantic parsers synthesize a program in one shot.
Relative to a set of randomly generated problem instances, agents trained through reinforcement learning techniques are capable of producing short quantum programs which generate high quality solutions on both types of quantum resources.
The basic objective of this paper is to reach the same results using reinforcement learning with general function approximators that can be achieved by using the classical Q lookup table on small input samples.
Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.
This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, `open loop - closed loop', approach.
Exploration is a key component of successful reinforcement learning, but optimal approaches are computationally intractable, so researchers have focused on hand-designing mechanisms based on exploration bonuses and intrinsic reward, some inspired by curious behavior in natural systems.