| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
25 January 2025 |
|
| | | |
|
Article overview
| |
|
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks | Mojtaba Valipour
; Mehdi Rezagholizadeh
; Hossein Rajabzadeh
; Marzieh Tahaei
; Boxing Chen
; Ali Ghodsi
; | Date: |
1 Sep 2023 | Abstract: | As the size of deep learning models continues to grow, finding optimal models
under memory and computation constraints becomes increasingly more important.
Although usually the architecture and constituent building blocks of neural
networks allow them to be used in a modular way, their training process is not
aware of this modularity. Consequently, conventional neural network training
lacks the flexibility to adapt the computational load of the model during
inference. This paper proposes SortedNet, a generalized and scalable solution
to harness the inherent modularity of deep neural networks across various
dimensions for efficient dynamic inference. Our training considers a nested
architecture for the sub-models with shared parameters and trains them together
with the main model in a sorted and probabilistic manner. This sorted training
of sub-networks enables us to scale the number of sub-networks to hundreds
using a single round of training. We utilize a novel updating scheme during
training that combines random sampling of sub-networks with gradient
accumulation to improve training efficiency. Furthermore, the sorted nature of
our training leads to a search-free sub-network selection at inference time;
and the nested architecture of the resulting sub-networks leads to minimal
storage requirement and efficient switching between sub-networks at inference.
Our general dynamic training approach is demonstrated across various
architectures and tasks, including large language models and pre-trained vision
models. Experimental results show the efficacy of the proposed approach in
achieving efficient sub-networks while outperforming state-of-the-art dynamic
training approaches. Our findings demonstrate the feasibility of training up to
160 different sub-models simultaneously, showcasing the extensive scalability
of our proposed method while maintaining 96% of the model performance. | Source: | arXiv, 2309.00255 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|