| | |
| | |
Stat |
Members: 3657 Articles: 2'599'751 Articles rated: 2609
08 October 2024 |
|
| | | |
|
Article overview
| |
|
Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble | Fan-Ming Luo
; Xingchen Cao
; Yang Yu
; | Date: |
1 Jun 2022 | Abstract: | Inverse reinforcement learning (IRL) recovers the underlying reward function
from expert demonstrations. A generalizable reward function is even desired as
it captures the fundamental motivation of the expert. However, classical IRL
methods can only recover reward functions coupled with the training dynamics,
thus are hard to generalize to a changed environment. Previous
dynamics-agnostic reward learning methods have strict assumptions, such as that
the reward function has to be state-only. This work proposes a general approach
to learn transferable reward functions, Dynamics-Agnostic
Discriminator-Ensemble Reward Learning (DARL). Following the adversarial
imitation learning (AIL) framework, DARL learns a dynamics-agnostic
discriminator on a latent space mapped from the original state-action space.
The latent space is learned to contain the least information of the dynamics.
Moreover, to reduce the reliance of the discriminator on policies, the reward
function is represented as an ensemble of the discriminators during training.
We assess DARL in four MuJoCo tasks with dynamics transfer. Empirical results
compared with the state-of-the-art AIL methods show that DARL can learn a
reward that is more consistent with the true reward, thus obtaining higher
environment returns. | Source: | arXiv, 2206.00238 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|