| | |
| | |
Stat |
Members: 3643 Articles: 2'487'895 Articles rated: 2609
28 March 2024 |
|
| | | |
|
Article overview
| |
|
MULEX: Disentangling Exploitation from Exploration in Deep RL | Lucas Beyer
; Damien Vincent
; Olivier Teboul
; Sylvain Gelly
; Matthieu Geist
; Olivier Pietquin
; | Date: |
1 Jul 2019 | Abstract: | An agent learning through interactions should balance its action selection
process between probing the environment to discover new rewards and using the
information acquired in the past to adopt useful behaviour. This trade-off is
usually obtained by perturbing either the agent’s actions (e.g., e-greedy or
Gibbs sampling) or the agent’s parameters (e.g., NoisyNet), or by modifying the
reward it receives (e.g., exploration bonus, intrinsic motivation, or
hand-shaped rewards). Here, we adopt a disruptive but simple and generic
perspective, where we explicitly disentangle exploration and exploitation.
Different losses are optimized in parallel, one of them coming from the true
objective (maximizing cumulative rewards from the environment) and others being
related to exploration. Every loss is used in turn to learn a policy that
generates transitions, all shared in a single replay buffer. Off-policy methods
are then applied to these transitions to optimize each loss. We showcase our
approach on a hard-exploration environment, show its sample-efficiency and
robustness, and discuss further implications. | Source: | arXiv, 1907.0868 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |