| | |
| | |
Stat |
Members: 3669 Articles: 2'599'751 Articles rated: 2609
22 March 2025 |
|
| | | |
|
Article overview
| |
|
Low-rank Bandits with Latent Mixtures | Aditya Gopalan
; Odalric-Ambrym Maillard
; Mohammadi Zaki
; | Date: |
6 Sep 2016 | Abstract: | We study the task of maximizing rewards from recommending items (actions) to
users sequentially interacting with a recommender system. Users are modeled as
latent mixtures of C many representative user classes, where each class
specifies a mean reward profile across actions. Both the user features (mixture
distribution over classes) and the item features (mean reward vector per class)
are unknown a priori. The user identity is the only contextual information
available to the learner while interacting. This induces a low-rank structure
on the matrix of expected rewards r a,b from recommending item a to user b. The
problem reduces to the well-known linear bandit when either user or item-side
features are perfectly known. In the setting where each user, with its
stochastically sampled taste profile, interacts only for a small number of
sessions, we develop a bandit algorithm for the two-sided uncertainty. It
combines the Robust Tensor Power Method of Anandkumar et al. (2014b) with the
OFUL linear bandit algorithm of Abbasi-Yadkori et al. (2011). We provide the
first rigorous regret analysis of this combination, showing that its regret
after T user interactions is $ ilde O(Csqrt{BT})$, with B the number of
users. An ingredient towards this result is a novel robustness property of
OFUL, of independent interest. | Source: | arXiv, 1609.1508 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|