no code implementations • 7 Sep 2023 • Joachim A. Behar, Jeremy Levy, Eran Zvuloni, Sheina Gendelman, Aviv Rosenberg, Shany Biton, Raphael Derman, Jonathan A. Sobel, Alexandra Alexandrovich, Peter Charlton, Márton Á Goda
PhysioZoo is a collaborative platform designed for the analysis of continuous physiological time series.
no code implementations • 15 May 2023 • Dirk van der Hoeven, Lukas Zierahn, Tal Lancewicki, Aviv Rosenberg, Nicoló Cesa-Bianchi
We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback.
no code implementations • 13 May 2023 • Tal Lancewicki, Aviv Rosenberg, Dmitry Sotnikov
Policy Optimization (PO) is one of the most popular methods in Reinforcement Learning (RL).
no code implementations • 7 Feb 2022 • Liyu Chen, Haipeng Luo, Aviv Rosenberg
Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees.
no code implementations • 31 Jan 2022 • Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg
The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.
no code implementations • 31 Jan 2022 • Tal Lancewicki, Aviv Rosenberg, Yishay Mansour
We study cooperative online learning in stochastic and adversarial Markov decision process (MDP).
no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal
Some of the most powerful reinforcement learning frameworks use planning for action selection.
no code implementations • NeurIPS 2021 • Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg
In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.
no code implementations • 29 Dec 2020 • Tal Lancewicki, Aviv Rosenberg, Yishay Mansour
We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of $\widetilde O ( \sqrt{K} + \sqrt{D} )$ under full-information feedback, where $K$ is the number of episodes and $D = \sum_{k} d^k$ is the total delay.
1 code implementation • NeurIPS 2021 • Aviv Rosenberg, Yishay Mansour
We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
no code implementations • 20 Jun 2020 • Aviv Rosenberg, Yishay Mansour
Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
no code implementations • ICML 2020 • Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg
In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star |S| \sqrt{|A| K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes.
no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor
To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
no code implementations • NeurIPS 2019 • Aviv Rosenberg, Yishay Mansour
We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.
no code implementations • 19 May 2019 • Aviv Rosenberg, Yishay Mansour
We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner.