Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3643
Articles: 2'487'895
Articles rated: 2609

28 March 2024
 
  » arxiv » 2006.9754

 Article overview


How isotropic kernels learn simple invariants
Jonas Paccolat ; Stefano Spigler ; Matthieu Wyart ;
Date 17 Jun 2020
AbstractWe investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $epsilon$ that follows $epsilonsim p^{-eta}$ where $p$ is the size of the training set. We find that $etasimfrac{1}{d}$ independently of $d_parallel$, supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the {it stripe model} where the data label depends on a single coordinate $y(underline x) = y(x_1)$, corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, $eta = frac{d-1+xi}{3d-3+xi}$, where $xiin (0,2)$ is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since $eta ightarrowfrac{1}{3}$ as $d ightarrowinfty$.
(iii) We confirm these findings for the {it spherical model} for which $y(underline x) = y(|!|underline x|!|)$. (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor $lambda$ (an operation believed to take place in deep networks), the test error is reduced by a factor $lambda^{-frac{2(d-1)}{3d-3+xi}}$.
Source arXiv, 2006.9754
Services Forum | Review | PDF | Favorites   
 
Visitor rating: did you like this article? no 1   2   3   4   5   yes

No review found.
 Did you like this article?

This article or document is ...
important:
of broad interest:
readable:
new:
correct:
Global appreciation:

  Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

browser claudebot






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica