Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3644
Articles: 2'499'343
Articles rated: 2609

16 April 2024
 
  » arxiv » 2007.11471

 Article overview


Compressing invariant manifolds in neural nets
Jonas Paccolat ; Leonardo Petrini ; Mario Geiger ; Kevin Tyloo ; Matthieu Wyart ;
Date 22 Jul 2020
AbstractWe study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the extit{feature learning} regime) trained with gradient descent, the uninformative $d_perp=d-d_parallel$ space is compressed by a factor $lambdasim sqrt{p}$, where $p$ is the size of the training set. We quantify the benefit of such a compression on the test error $epsilon$. For large initialization of the weights (the extit{lazy training} regime), no compression occurs and for regular boundaries separating labels we find that $epsilon sim p^{-eta}$, with $eta_mathrm{Lazy} = d / (3d-2)$. Compression improves the learning curves so that $eta_mathrm{Feature} = (2d-1)/(3d-2)$ if $d_parallel = 1$ and $eta_mathrm{Feature} = (d + frac{d_perp}{2})/(3d-2)$ if $d_parallel > 1$. We test these predictions for a stripe model where boundaries are parallel interfaces ($d_parallel=1$) as well as for a cylindrical boundary ($d_parallel=2$). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST. The great similarities found in these two cases support that compression is central to the training of MNIST, and puts forward kernel-PCA on the evolving NTK as a useful diagnostic of compression in deep nets.
Source arXiv, 2007.11471
Services Forum | Review | PDF | Favorites   
 
Visitor rating: did you like this article? no 1   2   3   4   5   yes

No review found.
 Did you like this article?

This article or document is ...
important:
of broad interest:
readable:
new:
correct:
Global appreciation:

  Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

browser claudebot






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica