Search Results for author: Yuval Peres

Found 12 papers, 1 papers with code

Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

1 code implementation NeurIPS 2019 Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).

reinforcement-learning Reinforcement Learning (RL) +1

Multiplayer Bandit Learning, from Competition to Cooperation

no code implementations3 Aug 2019 Simina Brânzei, Yuval Peres

We show that competing players explore less than a single player: there is $p^* \in (m, g)$ so that for all $p > p^*$, the players stay at the predictable arm.

Sorted Top-k in Rounds

no code implementations12 Jun 2019 Mark Braverman, Jieming Mao, Yuval Peres

When the comparisons are noiseless, we characterize how the optimal sample complexity depends on the number of rounds (up to a polylogarithmic factor for general $r$ and up to a constant factor for $r=1$ or 2).

Online Learning with an Almost Perfect Expert

no code implementations30 Jul 2018 Simina Brânzei, Yuval Peres

We study the multiclass online learning problem where a forecaster makes a sequence of predictions using the advice of $n$ experts.

Mixing time estimation in reversible Markov chains from a single sample path

no code implementations NeurIPS 2015 Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.

Tight Lower Bounds for Multiplicative Weights Algorithmic Families

no code implementations11 Jul 2016 Nick Gravin, Yuval Peres, Balasubramanian Sivan

We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem.

Bandit Convex Optimization: sqrt{T} Regret in One Dimension

no code implementations23 Feb 2015 Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

We analyze the minimax regret of the adversarial bandit convex optimization problem.

Thompson Sampling

Approval Voting and Incentives in Crowdsourcing

no code implementations19 Feb 2015 Nihar B. Shah, Dengyong Zhou, Yuval Peres

The growing need for labeled training data has made crowdsourcing an important part of machine learning.

Towards Optimal Algorithms for Prediction with Expert Advice

no code implementations10 Sep 2014 Nick Gravin, Yuval Peres, Balasubramanian Sivan

Further, we show that the optimal algorithm for $2$ and $3$ experts is a probability matching algorithm (analogous to Thompson sampling) against a particular randomized adversary.

Thompson Sampling

Online Learning with Composite Loss Functions

no code implementations18 May 2014 Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

This class includes problems where the algorithm's loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values.

Bandits with Switching Costs: T^{2/3} Regret

no code implementations11 Oct 2013 Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

We prove that the player's $T$-round minimax regret in this setting is $\widetilde{\Theta}(T^{2/3})$, thereby closing a fundamental gap in our understanding of learning with bandit feedback.

Cannot find the paper you are looking for? You can Submit a new open access paper.