| | |
| | |
Stat |
Members: 3657 Articles: 2'599'751 Articles rated: 2609
06 October 2024 |
|
| | | |
|
Article overview
| |
|
Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation | Sanae Amani
; Lin F. Yang
; Ching-An Cheng
; | Date: |
1 Jun 2022 | Abstract: | We study lifelong reinforcement learning (RL) in a regret minimization
setting of linear contextual Markov decision process (MDP), where the agent
needs to learn a multi-task policy while solving a streaming sequence of tasks.
We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that
provably achieves sublinear regret for any sequence of tasks, which may be
adaptively chosen based on the agent’s past behaviors. Remarkably, our
algorithm uses only sublinear number of planning calls, which means that the
agent eventually learns a policy that is near optimal for multiple tasks (seen
or unseen) without the need of deliberate planning. A key to this property is a
new structural assumption that enables computation sharing across tasks during
exploration. Specifically, for $K$ task episodes of horizon $H$, our algorithm
has a regret bound $ ilde{mathcal{O}}(sqrt{(d^3+d^prime d)H^4K})$ based on
$mathcal{O}(dHlog(K))$ number of planning calls, where $d$ and $d^prime$ are
the feature dimensions of the dynamics and rewards, respectively. This
theoretical guarantee implies that our algorithm can enable a lifelong learning
agent to accumulate experiences and learn to rapidly solve new tasks. | Source: | arXiv, 2206.00270 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|