| | |
| | |
Stat |
Members: 3645 Articles: 2'504'928 Articles rated: 2609
25 April 2024 |
|
| | | |
|
Article overview
| |
|
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding | Shuohang Wang
; Luowei Zhou
; Zhe Gan
; Yen-Chun Chen
; Yuwei Fang
; Siqi Sun
; Yu Cheng
; Jingjing Liu
; | Date: |
14 Sep 2020 | Abstract: | Transformer has become ubiquitous in the deep learning field. One of the key
ingredients that destined its success is the self-attention mechanism, which
allows fully-connected contextual encoding over input tokens. However, despite
its effectiveness in modeling short sequences, self-attention suffers when
handling inputs with extreme long-range dependencies, as its complexity grows
quadratically with respect to the sequence length. Therefore, long sequences
are often encoded by Transformer in chunks using a sliding window. In this
paper, we propose Cluster-Former, a novel clustering-based sparse Transformer
to perform attention across chunked sequences. Our proposed method allows
information integration beyond local windows, which is especially beneficial
for question answering (QA) and language modeling tasks that rely on long-range
dependencies. Experiments show that Cluster-Former achieves state-of-the-art
performance on several major QA benchmarks. | Source: | arXiv, 2009.06097 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |