Pierre Ménard, Gradient Ascent for Active Exploration in Bandit Problems, 2019 [pdf].

Aurélien Garivier, Hédi Hadiji, Pierre Ménard and Gilles Stoltz, KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints, 2018 [pdf, arxiv].

Aurélien Garivier, Pierre Ménard, Laurent Rossi, Thresholding Bandit for Dose-ranging: The Impact of Monotonicity, 2017 [pdf, arxiv].

Sébastien Gerchinovitz, Pierre Ménard and Gilles Stoltz, Fano’s inequality for random variables, 2017 [pdf, arxiv].


Aurélien Garivier, Pierre Ménard, A minimax and asymptotically optimal algorithm for stochastic bandits, Algorithmic Learning Theory, 2017 [pdf, arxiv].

Aurélien Garivier, Pierre Ménard and Gilles Stoltz, Explore first, exploite next: the true shape of regret in bandit problems, Mathematics of Operations Research, 2018 [pdf, arxiv].


On the notion of optimality in the stochastic multi-armed bandit problems [slides].