Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3645
Articles: 2'504'585
Articles rated: 2609

25 April 2024
 
  » 2086199

 Article forum



Scaling description of generalization with number of parameters in deep learning
Mario Geiger ; Arthur Jacot ; Stefano Spigler ; Franck Gabriel ; Levent Sagun ; Stéphane d'Ascoli ; Giulio Biroli ; Clément Hongler ; Matthieu Wyart ;
Date 6 Jan 2019
AbstractWe provide a description for the evolution of the generalization performance of fixed-depth fully-connected deep neural networks, as a function of their number of parameters $N$. In the setup where the number of data points is larger than the input dimension, as $N$ gets large, we observe that increasing $N$ at fixed depth reduces the fluctuations of the output function $f_N$ induced by initial conditions, with $|!|f_N-{ar f}_N|!|sim N^{-1/4}$ where ${ar f}_N$ denotes an average over initial conditions. We explain this asymptotic behavior in terms of the fluctuations of the so-called Neural Tangent Kernel that controls the dynamics of the output function. For the task of classification, we predict these fluctuations to increase the true test error $epsilon$ as $epsilon_{N}-epsilon_{infty}sim N^{-1/2} + mathcal{O}( N^{-3/4})$. This prediction is consistent with our empirical results on the MNIST dataset and it explains in a concrete case the puzzling observation that the predictive power of deep networks improves as the number of fitting parameters grows. This asymptotic description breaks down at a so-called jamming transition which takes place at a critical $N=N^*$, below which the training error is non-zero. In the absence of regularization, we observe an apparent divergence $|!|f_N|!|sim (N-N^*)^{-alpha}$ and provide a simple argument suggesting $alpha=1$, consistent with empirical observations. This result leads to a plausible explanation for the cusp in test error known to occur at $N^*$. Overall, our analysis suggests that once models are averaged, the optimal model complexity is reached just beyond the point where the data can be perfectly fitted, a result of practical importance that needs to be tested in a wide range of architectures and data set.
Source arXiv, 1901.1608
Services Forum | Review | PDF | Favorites   
 

No message found in this article forum.  You have a question or message about this article? Ask the community and write a message in the forum.
If you want to rate this article, please use the review section..

Subject of your forum message:
Write your forum message below (min 50, max 2000 characters)

2000 characters left.
Please, read carefully your message since you cannot modify it after submitting.

  To add a message in the forum, you need to login or register first. (free): registration page






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica