| | |
| | |
Stat |
Members: 3645 Articles: 2'506'133 Articles rated: 2609
26 April 2024 |
|
| | | |
|
Article overview
| |
|
Learning to Collaborate in Markov Decision Processes | Goran Radanovic
; Rati Devidze
; David Parkes
; Adish Singla
; | Date: |
23 Jan 2019 | Abstract: | We consider a two-agent MDP framework where agents repeatedly solve a task in
a collaborative setting. We study the problem of designing a learning algorithm
for the first agent (A1) that facilitates a successful collaboration even in
cases when the second agent (A2) is adapting its policy in an unknown way. The
key challenge in our setting is that the presence of the second agent leads to
non-stationarity and non-obliviousness of rewards and transitions for the first
agent.
We design novel online learning algorithms for agent A1 whose regret decays
as $O(T^{1-frac{3}{7} cdot alpha})$ with $T$ learning episodes provided that
the magnitude of agent A2’s policy changes between any two consecutive episodes
are upper bounded by $O(T^{-alpha})$. Here, the parameter $alpha$ is assumed
to be strictly greater than $0$, and we show that this assumption is necessary
provided that the {em learning parity with noise} problem is computationally
hard. We show that sub-linear regret of agent A1 further implies
near-optimality of the agents’ joint return for MDPs that manifest the
properties of a {em smooth} game. | Source: | arXiv, 1901.8029 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |