1 code implementation • 8 Apr 2024 • David Valensi, Esther Derman, Shie Mannor, Gal Dalal
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case.
no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant
In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.
no code implementations • 30 Jan 2023 • Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.
no code implementations • 28 Sep 2022 • Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik
This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.
1 code implementation • 5 Jul 2022 • Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal
As communication protocols evolve, datacenter network utilization increases.
1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal
Some of the most powerful reinforcement learning frameworks use planning for action selection.
no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit
We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.
1 code implementation • NeurIPS 2021 • Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik
We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.
no code implementations • 18 Feb 2021 • Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor
We approach the task of network congestion control in datacenters using Reinforcement Learning (RL).
2 code implementations • ICLR 2021 • Esther Derman, Gal Dalal, Shie Mannor
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu
With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.
no code implementations • 20 Nov 2019 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe
Algorithms such as these have two iterates, $\theta_n$ and $w_n,$ which are updated using two distinct stepsize sequences, $\alpha_n$ and $\beta_n,$ respectively.
no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.
no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.
no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.
6 code implementations • 26 Jan 2018 • Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa
We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated.
no code implementations • 4 Apr 2017 • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
TD(0) is one of the most commonly used algorithms in reinforcement learning.
no code implementations • 15 Mar 2017 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor
Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.
no code implementations • 20 Dec 2016 • Raphael Canyasse, Gal Dalal, Shie Mannor
In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).
no code implementations • 30 Nov 2016 • Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.
no code implementations • 6 Mar 2016 • Gal Dalal, Elad Gilboa, Shie Mannor
The power grid is a complex and vital system that necessitates careful reliability management.
no code implementations • 19 Jul 2015 • Gal Dalal, Shie Mannor
In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.