| | |
| | |
Stat |
Members: 3657 Articles: 2'599'751 Articles rated: 2609
09 October 2024 |
|
| | | |
|
Article overview
| |
|
Task-Specific Expert Pruning for Sparse Mixture-of-Experts | Tianyu Chen
; Shaohan Huang
; Yuan Xie
; Binxing Jiao
; Daxin Jiang
; Haoyi Zhou
; Jianxin Li
; Furu Wei
; | Date: |
Wed, 1 Jun 2022 07:09:01 GMT (1871kb,D) | Abstract: | The sparse Mixture-of-Experts (MoE) model is powerful for large-scale
pre-training and has achieved promising results due to its model capacity.
However, with trillions of parameters, MoE is hard to be deployed on cloud or
mobile environment. The inference of MoE requires expert parallelism, which is
not hardware-friendly and communication expensive. Especially for
resource-limited downstream tasks, such sparse structure has to sacrifice a lot
of computing efficiency for limited performance gains. In this work, we observe
most experts contribute scarcely little to the MoE fine-tuning and inference.
We further propose a general method to progressively drop the non-professional
experts for the target downstream task, which preserves the benefits of MoE
while reducing the MoE model into one single-expert dense model. Our
experiments reveal that the fine-tuned single-expert model could preserve 99.3%
benefits from MoE across six different types of tasks while enjoying 2x
inference speed with free communication cost. | Source: | arXiv, 2206.00277 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|