Multiarmed bandit games

MULTIARMED BANDIT GAMES FULL

Furthermore, the Braess paradox does not occur to the extent proposed originally when travelers are risk-averse. It is shown by examples that the risk-averse behavior of travelers in a stochastic congestion game can improve the price of anarchy in Pigou and Braess networks. Furthermore, the stochastic congestion games are studied from a risk-averse perspective and three classes of equilibria are proposed for such games. pure and mixed risk-averse R-ABADI equilibrium and strict dominance, are studied in the new framework and the results are expanded to finite-time games. The fundamental properties of games, e.g. The payoff distributions are taken into account to derive the risk-averse equilibrium, while the expected payoffs are used to find the Nash equilibrium. We use a similar idea for games and propose a risk-averse R-ABADI equilibrium in game theory that is possibly different from the Nash equilibrium. Instead, we propose a new definition of score that is derived from the joint distribution of all arm rewards and captures the reward of an arm relative to those of all other arms. The goal of the classical multi-armed bandits is to exploit the arm with the maximum score defined as the expected value of the arm reward. In this manner, a specific class of multi-armed bandits, called explore-then-commit bandits, and stochastic games are studied in this dissertation, which are based on the notion of Risk-Averse Best Action Decision with Incomplete Information (R-ABADI, Abadi is the maiden name of the author's mother). The author believes that human beings are mostly risk-averse, so studying multi-armed bandits and game theory from the point of view of risk aversion, rather than expected reward/payoff, better captures reality. The focus of this dissertation is to study the fundamental limits of the existing bandits and game theory problems in a risk-averse framework and propose new ideas that address the shortcomings. In contrast, the rewards and the payoffs are often random variables whose expected values only capture a vague idea of the overall distribution. The multi-armed bandit (MAB) and game theory literature is mainly focused on the expected cumulative reward and the expected payoffs in a game, respectively. University of Illinois at Urbana-Champaign Hajek, Bruce Shomorony, Ilan Srikant, Rayadurgam Risk-averse multi-armed bandits and game theory The book is also available on arxiv (in a plain-format version).Application/pdf YEKKEHKHANY-DISSERTATION-2020.pdf (7MB)

MULTIARMED BANDIT GAMES FULL

The chapters are as follows: stochastic bandits lower bounds Bayesian bandits and Thompson Sampling Lipschitz Bandits full feedback and adversarial costs adversarial bandits linear costs and semi-bandits contextual bandits bandits and games bandits with knapsacks bandits and incentives.

There are no prerequisites other than a certain level of mathematical maturity. Each chapter handles one big direction in the literature on bandits, covers the first-order concepts and results on a technical level, and provides a detailed literature review for further exploration.

The book is teachable by design: each chapter corresponds to one week of my class. The said connections have generated a considerable amount of interest (and publications) in the Economics and Computation community.

I am pleased to announce Introduction to multi-armed bandits, a broad and accessible introduction to the area which emphasizes connections to operations research, game theory, and mechanism design. Here’s Alex’s announcement of his new book, which I am very excited about, and many in our community would no doubt find extremely useful (there’s even an open version on arXiv!):