| | |
| | |
Stat |
Members: 3657 Articles: 2'599'751 Articles rated: 2609
09 October 2024 |
|
| | | |
|
Article overview
| |
|
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus | Qiwen Cui
; Simon S. Du
; | Date: |
1 Jun 2022 | Abstract: | This paper considers offline multi-agent reinforcement learning. We propose
the strategy-wise concentration principle which directly builds a confidence
interval for the joint strategy, in contrast to the point-wise concentration
principle that builds a confidence interval for each point in the joint action
space. For two-player zero-sum Markov games, by exploiting the convexity of the
strategy-wise bonus, we propose a computationally efficient algorithm whose
sample complexity enjoys a better dependency on the number of actions than the
prior methods based on the point-wise bonus. Furthermore, for offline
multi-agent general-sum Markov games, based on the strategy-wise bonus and a
novel surrogate function, we give the first algorithm whose sample complexity
only scales $sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th
player and $m$ is the number of players. In sharp contrast, the sample
complexity of methods based on the point-wise bonus would scale with the size
of the joint action space $Pi_{i=1}^m A_i$ due to the curse of multiagents.
Lastly, all of our algorithms can naturally take a pre-specified strategy class
$Pi$ as input and output a strategy that is close to the best strategy in
$Pi$. In this setting, the sample complexity only scales with $log |Pi|$
instead of $sum_{i=1}^mA_i$. | Source: | arXiv, 2206.00159 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|