Search Results for author: Yuval Peres

Found 12 papers, 1 papers with code

Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

1 code implementation • NeurIPS 2019 • Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Multiplayer Bandit Learning, from Competition to Cooperation

no code implementations • 3 Aug 2019 • Simina Brânzei, Yuval Peres

We show that competing players explore less than a single player: there is $p^* \in (m, g)$ so that for all $p > p^*$, the players stay at the predictable arm.

Paper
Add Code

Sorted Top-k in Rounds

no code implementations • 12 Jun 2019 • Mark Braverman, Jieming Mao, Yuval Peres

When the comparisons are noiseless, we characterize how the optimal sample complexity depends on the number of rounds (up to a polylogarithmic factor for general $r$ and up to a constant factor for $r=1$ or 2).

Paper
Add Code

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

no code implementations • 28 Apr 2019 • Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem.

Multi-Armed Bandits

Paper
Add Code

Online Learning with an Almost Perfect Expert

no code implementations • 30 Jul 2018 • Simina Brânzei, Yuval Peres

We study the multiclass online learning problem where a forecaster makes a sequence of predictions using the advice of $n$ experts.

Paper
Add Code

Mixing time estimation in reversible Markov chains from a single sample path

no code implementations • NeurIPS 2015 • Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.

Paper
Add Code

Tight Lower Bounds for Multiplicative Weights Algorithmic Families

no code implementations • 11 Jul 2016 • Nick Gravin, Yuval Peres, Balasubramanian Sivan

We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem.

Paper
Add Code

Bandit Convex Optimization: sqrt{T} Regret in One Dimension

no code implementations • 23 Feb 2015 • Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

We analyze the minimax regret of the adversarial bandit convex optimization problem.

Thompson Sampling

Paper
Add Code

Approval Voting and Incentives in Crowdsourcing

no code implementations • 19 Feb 2015 • Nihar B. Shah, Dengyong Zhou, Yuval Peres

The growing need for labeled training data has made crowdsourcing an important part of machine learning.

Paper
Add Code

Towards Optimal Algorithms for Prediction with Expert Advice

no code implementations • 10 Sep 2014 • Nick Gravin, Yuval Peres, Balasubramanian Sivan

Further, we show that the optimal algorithm for $2$ and $3$ experts is a probability matching algorithm (analogous to Thompson sampling) against a particular randomized adversary.

Thompson Sampling

Paper
Add Code

Online Learning with Composite Loss Functions

no code implementations • 18 May 2014 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

This class includes problems where the algorithm's loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values.

Paper
Add Code

Bandits with Switching Costs: T^{2/3} Regret

no code implementations • 11 Oct 2013 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

We prove that the player's $T$-round minimax regret in this setting is $\widetilde{\Theta}(T^{2/3})$, thereby closing a fundamental gap in our understanding of learning with bandit feedback.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.