| | |
| | |
Stat |
Members: 3643 Articles: 2'487'895 Articles rated: 2609
28 March 2024 |
|
| | | |
|
Article overview
| |
|
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning | Shixiang Gu
; Timothy Lillicrap
; Zoubin Ghahramani
; Richard E. Turner
; Bernhard Schölkopf
; Sergey Levine
; | Date: |
1 Jun 2017 | Abstract: | Off-policy model-free deep reinforcement learning methods using previously
collected data can improve sample efficiency over on-policy policy gradient
techniques. On the other hand, on-policy algorithms are often more stable and
easier to use. This paper examines, both theoretically and empirically,
approaches to merging on- and off-policy updates for deep reinforcement
learning. Theoretical results show that off-policy updates with a value
function estimator can be interpolated with on-policy policy gradient updates
whilst still satisfying performance bounds. Our analysis uses control variate
methods to produce a family of policy gradient algorithms, with several
recently proposed algorithms being special cases of this family. We then
provide an empirical comparison of these techniques with the remaining
algorithmic details fixed, and show how different mixing of off-policy gradient
estimates with on-policy samples contribute to improvements in empirical
performance. The final algorithm provides a generalization and unification of
existing deep policy gradient techniques, has theoretical guarantees on the
bias introduced by off-policy updates, and improves on the state-of-the-art
model-free deep RL methods on a number of OpenAI Gym continuous control
benchmarks. | Source: | arXiv, 1706.0387 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |