Search Results for author: Pierre Perrault

Found 9 papers, 1 papers with code

Budgeted Online Influence Maximization

no code implementations ICML 2020 Pierre Perrault, Zheng Wen, Michal Valko, Jennifer Healey

We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set.

valid

Demonstration-Regularized RL

no code implementations26 Oct 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Fast Rates for Maximum Entropy Exploration

1 code implementation14 Mar 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

When Combinatorial Thompson Sampling meets Approximation Regret

no code implementations22 Feb 2023 Pierre Perrault

We provide the first $\mathcal{O}(\log(T)/\Delta)$ approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis.

Thompson Sampling

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

no code implementations5 Jan 2021 Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa.

Data Structures and Algorithms

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

no code implementations NeurIPS 2020 Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.

Thompson Sampling

Online A-Optimal Design and Active Linear Regression

no code implementations20 Jun 2019 Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.

regression

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

no code implementations11 Feb 2019 Pierre Perrault, Vianney Perchet, Michal Valko

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.

Finding the bandit in a graph: Sequential search-and-stop

no code implementations6 Jun 2018 Pierre Perrault, Vianney Perchet, Michal Valko

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.