| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
17 January 2025 |
|
| | | |
|
Article overview
| |
|
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback | Harrison Lee
; Samrat Phatale
; Hassan Mansoor
; Kellie Lu
; Thomas Mesnard
; Colton Bishop
; Victor Carbune
; Abhinav Rastogi
; | Date: |
1 Sep 2023 | Abstract: | Reinforcement learning from human feedback (RLHF) is effective at aligning
large language models (LLMs) to human preferences, but gathering high quality
human preference labels is a key bottleneck. We conduct a head-to-head
comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where
preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find
that they result in similar improvements. On the task of summarization, human
evaluators prefer generations from both RLAIF and RLHF over a baseline
supervised fine-tuned model in ~70% of cases. Furthermore, when asked to rate
RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results
suggest that RLAIF can yield human-level performance, offering a potential
solution to the scalability limitations of RLHF. | Source: | arXiv, 2309.00267 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|