Research articles

search articles

My Pages

Stat

Members: 3645
Articles: 2'504'928
Articles rated: 2609

25 April 2024

» arxiv » 2009.06097

Article overview

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang ; Luowei Zhou ; Zhe Gan ; Yen-Chun Chen ; Yuwei Fang ; Siqi Sun ; Yu Cheng ; Jingjing Liu ;
Date:	14 Sep 2020
Abstract:	Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme long-range dependencies, as its complexity grows quadratically with respect to the sequence length. Therefore, long sequences are often encoded by Transformer in chunks using a sliding window. In this paper, we propose Cluster-Former, a novel clustering-based sparse Transformer to perform attention across chunked sequences. Our proposed method allows information integration beyond local windows, which is especially beneficial for question answering (QA) and language modeling tasks that rely on long-range dependencies. Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
Source:	arXiv, 2009.06097
Services:	Forum \| Review \| PDF \| Favorites

No review found.

Did you like this article?

This article or document is ...
important:
of broad interest:
readable:
new:
correct:
Global appreciation:

Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

ScienXe.org
» my Online CV
» Free

News, job offers and information for researchers and scientists:

home

contact

sitemap