Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 2922
Articles: 1'998'104
Articles rated: 2574

27 September 2020
 
  » arxiv » 1901.1608

 Article overview


Scaling description of generalization with number of parameters in deep learning
Mario Geiger ; Arthur Jacot ; Stefano Spigler ; Franck Gabriel ; Levent Sagun ; Stéphane d'Ascoli ; Giulio Biroli ; Clément Hongler ; Matthieu Wyart ;
Date 6 Jan 2019
AbstractWe provide a description for the evolution of the generalization performance of fixed-depth fully-connected deep neural networks, as a function of their number of parameters $N$. In the setup where the number of data points is larger than the input dimension, as $N$ gets large, we observe that increasing $N$ at fixed depth reduces the fluctuations of the output function $f_N$ induced by initial conditions, with $|!|f_N-{ar f}_N|!|sim N^{-1/4}$ where ${ar f}_N$ denotes an average over initial conditions. We explain this asymptotic behavior in terms of the fluctuations of the so-called Neural Tangent Kernel that controls the dynamics of the output function. For the task of classification, we predict these fluctuations to increase the true test error $epsilon$ as $epsilon_{N}-epsilon_{infty}sim N^{-1/2} + mathcal{O}( N^{-3/4})$. This prediction is consistent with our empirical results on the MNIST dataset and it explains in a concrete case the puzzling observation that the predictive power of deep networks improves as the number of fitting parameters grows. This asymptotic description breaks down at a so-called jamming transition which takes place at a critical $N=N^*$, below which the training error is non-zero. In the absence of regularization, we observe an apparent divergence $|!|f_N|!|sim (N-N^*)^{-alpha}$ and provide a simple argument suggesting $alpha=1$, consistent with empirical observations. This result leads to a plausible explanation for the cusp in test error known to occur at $N^*$. Overall, our analysis suggests that once models are averaged, the optimal model complexity is reached just beyond the point where the data can be perfectly fitted, a result of practical importance that needs to be tested in a wide range of architectures and data set.
Source arXiv, 1901.1608
Services Forum | Review | PDF | Favorites   
 
Visitor rating: did you like this article? no 1   2   3   4   5   yes

No review found.
 Did you like this article?

This article or document is ...
important:
of broad interest:
readable:
new:
correct:
Global appreciation:

  Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

browser CCBot/2.0 (https://commoncrawl.org/faq/)






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2020 - Scimetrica