We tackle the online learning to rank problem of assigning L items to K predefined positions on a web page in order to maximize the number of user clicks.
To address this problem, one can learn, in a multiple-play semi-bandit setting, the parameters of a behavioral click model, e.g. the so-called position-based model (PBM).
State-of-the-art algorithms rarely tackle the full PBM, i.e. PBM with all its parameters unknown.
Moreover, efficient algorithmic frameworks such as Thompson Sampling or Unimodal bandits were seldom considered for diverse behavioral click models.
Three algorithmic contributions are presented in this thesis. Two of them are based on the unimodal bandit setting: GRAB is specialized for full PBM and explores a family of graphs parameterized by the ranking of display positions. UniRank can be used in multiple click models. It builds a graph on partitions of items. These two efficient contribution achieve a theoretical regret upper-bound on par with the state-of-the-art.
The third contribution proposes a family of bandit algorithms designed to handle the full PBM and are based on a Thompson Sampling framework, coupled with Markov Chain Monte Carlo (MCMC) sampling methods. Two MCMC methods are used: Langevin Gradient Descent, PB-LB, which shows good empirical regret performance with a low and stable computation time and Metropolis Hasting, PB-MHB, less efficient but with the lowest empirical regret seen in the state-of-the-art for so few model assumptions.
Vianney PERCHET, Professeur, ENSAE
Philippe PREUX Professeur, Université de Lille
Président :
François TAÏANI Professeur, Université Rennes 1
Examinateurs :
Audrey DURAND Assistant Professor, Université Laval
Jeremie MARY Senior Researcher, Criteo
Claire VERNADE Research Scientist, DeepMind
Dir. de thèse :
Elisa FROMONT Professeure, Université Rennes 1, IRISA
Romaric GAUDEL Maitre de conférence, ENSAI, CREST