  
  
Stat 
Members: 3657 Articles: 2'599'751 Articles rated: 2609
14 October 2024 

   

Article overview
 

Contextual Bandits with Knapsacks for a Conversion Model  Zhen Li
; Gilles Stoltz
;  Date: 
1 Jun 2022  Abstract:  We consider contextual bandits with knapsacks, with an underlying structure
between rewards generated and cost vectors suffered. We do so motivated by
sales with commercial discounts. At each round, given the stochastic i.i.d.
context $mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e.g., to a
discount level), a customer conversion may be obtained, in which case a reward
$r(a,mathbf{x}_t)$ is gained and vector costs $c(a_t,mathbf{x}_t)$ are
suffered (corresponding, e.g., to losses of earnings). Otherwise, in the
absence of a conversion, the reward and costs are null. The reward and costs
achieved are thus coupled through the binary variable measuring conversion or
the absence thereof. This underlying structure between rewards and costs is
different from the linear structures considered by Agrawal and Devanur [2016]
but we show that the techniques introduced in this article may also be applied
to the latter case. Namely, the adaptive policies exhibited solve at each round
a linear program based on upperconfidence estimates of the probabilities of
conversion given $a$ and $mathbf{x}$. This kind of policy is most natural and
achieves a regret bound of the typical order (OPT/$B$) $sqrt{T}$, where $B$ is
the total budget allowed, OPT is the optimal expected reward achievable by a
static policy, and $T$ is the number of rounds.  Source:  arXiv, 2206.00314  Services:  Forum  Review  PDF  Favorites 


No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

 


