Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3645
Articles: 2'506'133
Articles rated: 2609

28 April 2024
 
  » 2218354

 Article forum



How isotropic kernels learn simple invariants
Jonas Paccolat ; Stefano Spigler ; Matthieu Wyart ;
Date 17 Jun 2020
AbstractWe investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $epsilon$ that follows $epsilonsim p^{-eta}$ where $p$ is the size of the training set. We find that $etasimfrac{1}{d}$ independently of $d_parallel$, supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the {it stripe model} where the data label depends on a single coordinate $y(underline x) = y(x_1)$, corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, $eta = frac{d-1+xi}{3d-3+xi}$, where $xiin (0,2)$ is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since $eta ightarrowfrac{1}{3}$ as $d ightarrowinfty$.
(iii) We confirm these findings for the {it spherical model} for which $y(underline x) = y(|!|underline x|!|)$. (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor $lambda$ (an operation believed to take place in deep networks), the test error is reduced by a factor $lambda^{-frac{2(d-1)}{3d-3+xi}}$.
Source arXiv, 2006.9754
Services Forum | Review | PDF | Favorites   
 

No message found in this article forum.  You have a question or message about this article? Ask the community and write a message in the forum.
If you want to rate this article, please use the review section..

Subject of your forum message:
Write your forum message below (min 50, max 2000 characters)

2000 characters left.
Please, read carefully your message since you cannot modify it after submitting.

  To add a message in the forum, you need to login or register first. (free): registration page






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica