| | |
| | |
Stat |
Members: 3645 Articles: 2'501'711 Articles rated: 2609
19 April 2024 |
|
| | | |
|
Article overview
| |
|
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP | Kefan Dong
; Yuanhao Wang
; Xiaoyu Chen
; Liwei Wang
; | Date: |
27 Jan 2019 | Abstract: | A fundamental question in reinforcement learning is whether model-free
algorithms are sample efficient. Recently, Jin et al. cite{jin2018q} proposed
a Q-learning algorithm with UCB exploration policy, and proved it has nearly
optimal regret bound for finite-horizon episodic MDP. In this paper, we adapt
Q-learning with UCB-exploration bonus to infinite-horizon MDP with discounted
rewards emph{without} accessing a generative model. We show that the
extit{sample complexity of exploration} of our algorithm is bounded by
$ ilde{O}({frac{SA}{epsilon^2(1-gamma)^7}})$. This improves the previously
best known result of $ ilde{O}({frac{SA}{epsilon^4(1-gamma)^8}})$ in this
setting achieved by delayed Q-learning cite{strehl2006pac}, and matches the
lower bound in terms of $epsilon$ as well as $S$ and $A$ except for
logarithmic factors. | Source: | arXiv, 1901.9311 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |