  
  
Stat 
Members: 2922 Articles: 1'998'104 Articles rated: 2574
27 September 2020 

   

Article overview
 

Scaling description of generalization with number of parameters in deep learning  Mario Geiger
; Arthur Jacot
; Stefano Spigler
; Franck Gabriel
; Levent Sagun
; Stéphane d'Ascoli
; Giulio Biroli
; Clément Hongler
; Matthieu Wyart
;  Date: 
6 Jan 2019  Abstract:  We provide a description for the evolution of the generalization performance
of fixeddepth fullyconnected deep neural networks, as a function of their
number of parameters $N$. In the setup where the number of data points is
larger than the input dimension, as $N$ gets large, we observe that increasing
$N$ at fixed depth reduces the fluctuations of the output function $f_N$
induced by initial conditions, with $!f_N{ar f}_N!sim N^{1/4}$ where
${ar f}_N$ denotes an average over initial conditions. We explain this
asymptotic behavior in terms of the fluctuations of the socalled Neural
Tangent Kernel that controls the dynamics of the output function. For the task
of classification, we predict these fluctuations to increase the true test
error $epsilon$ as $epsilon_{N}epsilon_{infty}sim N^{1/2} + mathcal{O}(
N^{3/4})$. This prediction is consistent with our empirical results on the
MNIST dataset and it explains in a concrete case the puzzling observation that
the predictive power of deep networks improves as the number of fitting
parameters grows. This asymptotic description breaks down at a socalled
jamming transition which takes place at a critical $N=N^*$, below which the
training error is nonzero. In the absence of regularization, we observe an
apparent divergence $!f_N!sim (NN^*)^{alpha}$ and provide a simple
argument suggesting $alpha=1$, consistent with empirical observations. This
result leads to a plausible explanation for the cusp in test error known to
occur at $N^*$. Overall, our analysis suggests that once models are averaged,
the optimal model complexity is reached just beyond the point where the data
can be perfectly fitted, a result of practical importance that needs to be
tested in a wide range of architectures and data set.  Source:  arXiv, 1901.1608  Services:  Forum  Review  PDF  Favorites 


No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser CCBot/2.0 (https://commoncrawl.org/faq/)

 



 News, job offers and information for researchers and scientists:
 