M. C. Martín Blanco, A. Jiménez Martín, A. Mateos Caballero
The multi-armed bandit problem has been deeply studied in statistics becoming fundamental in different areas of economics or artificial intelligence. Different allocation strategies/policies can be found in the literature for this problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method, and a dynamic extension. The uncertainty about the arm expected rewards are first modelled by means of possibilistic reward distributions. Next, we use a pignistic probability transformation to convert these possibilistic functions into probability distributions. Finally, a simulation experiment is carried out by sampling from each arm with the corresponding probability distribution to find out the one with the highest expected reward, which is then pulled. A numerical study proves that the proposed method outperforms other policies in the literature in all tested scenarios.
Palabras clave: multi-armed bandit problem, possibilistic reward, numerical study
Programado
X09.3 Inferencia Estadística II
7 de septiembre de 2016 17:30
Aula 21.08