| | |
| | |
Stat |
Members: 3645 Articles: 2'501'711 Articles rated: 2609
20 April 2024 |
|
| | | |
|
Article overview
| |
|
PMI-Masking: Principled masking of correlated spans | Yoav Levine
; Barak Lenz
; Opher Lieber
; Omri Abend
; Kevin Leyton-Brown
; Moshe Tennenholtz
; Yoav Shoham
; | Date: |
5 Oct 2020 | Abstract: | Masking tokens uniformly at random constitutes a common flaw in the
pretraining of Masked Language Models (MLMs) such as BERT. We show that such
uniform masking allows an MLM to minimize its training objective by latching
onto shallow local signals, leading to pretraining inefficiency and suboptimal
downstream performance. To address this flaw, we propose PMI-Masking, a
principled masking strategy based on the concept of Pointwise Mutual
Information (PMI), which jointly masks a token n-gram if it exhibits high
collocation over the corpus. PMI-Masking motivates, unifies, and improves upon
prior more heuristic approaches that attempt to address the drawback of random
uniform token masking, such as whole-word masking, entity/phrase masking, and
random-span masking. Specifically, we show experimentally that PMI-Masking
reaches the performance of prior masking approaches in half the training time,
and consistently improves performance at the end of training. | Source: | arXiv, 2010.01825 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |