| | |
| | |
Stat |
Members: 3645 Articles: 2'500'096 Articles rated: 2609
18 April 2024 |
|
| | | |
|
Article overview
| |
|
How neural networks find generalizable solutions: Self-tuned annealing in deep learning | Yu Feng
; Yuhai Tu
; | Date: |
6 Jan 2020 | Abstract: | Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm
in deep learning, little is known about how SGD finds generalizable solutions
in the high-dimensional weight space. By analyzing the learning dynamics and
loss function landscape, we discover a robust inverse relation between the
weight variance and the landscape flatness (inverse of curvature) for all
SGD-based learning algorithms. To explain the inverse variance-flatness
relation, we develop a random landscape theory, which shows that the SGD noise
strength (effective temperature) depends inversely on the landscape flatness.
Our study indicates that SGD attains a self-tuned landscape-dependent annealing
strategy to find generalizable solutions at the flat minima of the landscape.
Finally, we demonstrate how these new theoretical insights lead to more
efficient algorithms, e.g., for avoiding catastrophic forgetting. | Source: | arXiv, 2001.1678 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |