Science-advisor
REGISTER info/FAQ
Login
username
password
     
forgot password?
register here
 
Research articles
  search articles
  reviews guidelines
  reviews
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
 
 
Stat
Members: 3645
Articles: 2'506'133
Articles rated: 2609

26 April 2024
 
  » arxiv » 2011.06501

 Article overview



VARCLUST: clustering variables using dimensionality reduction
Piotr Sobczyk ; Stanislaw Wilczynski ; Malgorzata Bogdan ; Piotr Graczyk ; Julie Josse ; Fabien Panloup ; Valérie Seegers ; Mateusz Staniak ;
Date 12 Nov 2020
AbstractVARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task is viewed as the problem of selection of the statistical model, which is defined by the number of clusters, the partition of variables into these clusters and the ’cluster dimensions’, i.e. the vector of dimensions of linear subspaces spanning each of the clusters. The optimal model is selected using the approximate Bayesian criterion based on the Laplace approximations and using a non-informative uniform prior on the number of clusters. To solve the problem of the search over a huge space of possible models we propose an extension of the ClustOfVar algorithm which was dedicated to subspaces of dimension only 1, and which is similar in structure to the $K$-centroid algorithm. We provide a complete methodology with theoretical guarantees, extensive numerical experimentations, complete data analyses and implementation. Our algorithm assigns variables to appropriate clusterse based on the consistent Bayesian Information Criterion (BIC), and estimates the dimensionality of each cluster by the PEnalized SEmi-integrated Likelihood Criterion (PESEL), whose consistency we prove. Additionally, we prove that each iteration of our algorithm leads to an increase of the Laplace approximation to the model posterior probability and provide the criterion for the estimation of the number of clusters. Numerical comparisons with other algorithms show that VARCLUST may outperform some popular machine learning tools for sparse subspace clustering. We also report the results of real data analysis including TCGA breast cancer data and meteorological data. The proposed method is implemented in the publicly available R package varclust.
Source arXiv, 2011.06501
Services Forum | Review | PDF | Favorites   
 
Visitor rating: did you like this article? no 1   2   3   4   5   yes

No review found.
 Did you like this article?

This article or document is ...
important:
of broad interest:
readable:
new:
correct:
Global appreciation:

  Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.

browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)






ScienXe.org
» my Online CV
» Free


News, job offers and information for researchers and scientists:
home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica