Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3645
Articles: 2'506'133
Articles rated: 2609

27 April 2024
 
  » 2033500

 Article forum



Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
Shixiang Gu ; Timothy Lillicrap ; Zoubin Ghahramani ; Richard E. Turner ; Bernhard Schölkopf ; Sergey Levine ;
Date 1 Jun 2017
AbstractOff-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks.
Source arXiv, 1706.0387
Services Forum | Review | PDF | Favorites   
 

No message found in this article forum.  You have a question or message about this article? Ask the community and write a message in the forum.
If you want to rate this article, please use the review section..

Subject of your forum message:
Write your forum message below (min 50, max 2000 characters)

2000 characters left.
Please, read carefully your message since you cannot modify it after submitting.

  To add a message in the forum, you need to login or register first. (free): registration page






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica