Shrinkage in Contextual Bandits with Dependent Observations

Jules Kruijswijk (first author), Maurits Kaptein and Florian Böing-Messing and I have recently been working on a paper on the application of partial pooling (or shrinkage) models in (contextual) multi-armed bandit problems:

The (contextual) multi-armed bandit problem (cMAB) provides a formalization for sequential experiments which has many applications. The context in the cMAB problem is expected to provide extra information that potentially influences the distribution of the rewards. In some applications of cMAB problems we find that there are hierarchical structures in the data that influence the reward distributions; in such cases we effectively end up with a dependence between observations originating from the same context. Current research in the cMAB literature either ignores this structure, or models observations at the lowest level of analysis, effectively ignoring the hierarchy. In this paper we introduce means of exploiting hierarchical structures via so- called partial pooling (or shrinkage) models, and we adapt a number of popular cMAB policies to incorporate these strategies. Through simulation we show that we can improve the performance of a number of popular policies by including partial pooling when hierarchical structures are indeed present, and we validate this result in an empirical study. Furthermore, we discuss how we can further improve the proposed adaptations.

We were able to implement and run the paper’s simulations in record time though the use of my R package contextual.

It took just five basic steps:

  1. Implementing BernoulliBandit by subclassing contextual’s Bandit superclass.
  2. Implementing pooled, partially pooled and unpooled versions of (c)MAB policies by subclassing contextual’s Policy superclass.
  3. Defining the parameters of all of the simulations we wanted to run in a top-level R script.
  4. Running the top-level R script on a 256 core EC2 instance with Louis Aslett’s RStudio Server Amazon Machine Image (AMI). As contextual runs parallelized by default, it automatically took advantage of every available core. So our simulations were completed in a fraction of the time it would have taken on my trusty old ThinkPad.
  5. Plot the results, and save them – making use of contextual‘s flexible and efficient in build Plot class.

All of the code for the simulations can be found in contextual’s demo directory, under paper_dependent_observations.

Happily, the resulting statistics and plots confirmed our hypotheses:




Leave a Reply