Search Results for author: Gal Dalal

Found 25 papers, 6 papers with code

Tree Search-Based Policy Optimization under Stochastic Execution Delay

1 code implementation • 8 Apr 2024 • David Valensi, Esther Derman, Shie Mannor, Gal Dalal

We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case.

Paper
Code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Paper
Add Code

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

no code implementations • 30 Jan 2023 • Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

Policy Gradient Methods

Paper
Add Code

SoftTreeMax: Policy Gradient with Tree Search

no code implementations • 28 Sep 2022 • Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

Policy Gradient Methods

Paper
Add Code

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

1 code implementation • 5 Jul 2022 • Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal

As communication protocols evolve, datacenter network utilization increases.

Fairness reinforcement-learning +1

Paper
Code

Reinforcement Learning with a Terminator

1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Paper
Code

Planning and Learning with Adaptive Lookahead

no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

Paper
Add Code

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Paper
Add Code

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

1 code implementation • NeurIPS 2021 • Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

Atari Games

Paper
Code

Reinforcement Learning for Datacenter Congestion Control

no code implementations • 18 Feb 2021 • Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL).

Network Congestion Control reinforcement-learning +1

Paper
Add Code

Acting in Delayed Environments with Non-Stationary Markov Policies

2 code implementations • ICLR 2021 • Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

Cloud Computing Q-Learning

Paper
Code

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

no code implementations • 20 Nov 2019 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe

Algorithms such as these have two iterates, $\theta_n$ and $w_n,$ which are updated using two distinct stepsize sequences, $\alpha_n$ and $\beta_n,$ respectively.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Beyond the One-Step Greedy Approach in Reinforcement Learning

no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

Beyond the One Step Greedy Approach in Reinforcement Learning

no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Safe Exploration in Continuous Action Spaces

6 code implementations • 26 Jan 2018 • Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa

We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated.

Reinforcement Learning (RL) Safe Exploration

524

Paper
Code

Finite Sample Analyses for TD(0) with Function Approximation

no code implementations • 4 Apr 2017 • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

TD(0) is one of the most commonly used algorithms in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

no code implementations • 15 Mar 2017 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

no code implementations • 20 Dec 2016 • Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

Paper
Add Code

Unit Commitment using Nearest Neighbor as a Short-Term Proxy

no code implementations • 30 Nov 2016 • Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

Paper
Add Code

Hierarchical Decision Making In Electricity Grid Management

no code implementations • 6 Mar 2016 • Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

Decision Making Management +1

Paper
Add Code

Reinforcement Learning for the Unit Commitment Problem

no code implementations • 19 Jul 2015 • Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.