| | |
| | |
Stat |
Members: 3645 Articles: 2'506'133 Articles rated: 2609
26 April 2024 |
|
| | | |
|
Article overview
| |
|
VARCLUST: clustering variables using dimensionality reduction | Piotr Sobczyk
; Stanislaw Wilczynski
; Malgorzata Bogdan
; Piotr Graczyk
; Julie Josse
; Fabien Panloup
; Valérie Seegers
; Mateusz Staniak
; | Date: |
12 Nov 2020 | Abstract: | VARCLUST algorithm is proposed for clustering variables under the assumption
that variables in a given cluster are linear combinations of a small number of
hidden latent variables, corrupted by the random noise. The entire clustering
task is viewed as the problem of selection of the statistical model, which is
defined by the number of clusters, the partition of variables into these
clusters and the ’cluster dimensions’, i.e. the vector of dimensions of linear
subspaces spanning each of the clusters. The optimal model is selected using
the approximate Bayesian criterion based on the Laplace approximations and
using a non-informative uniform prior on the number of clusters. To solve the
problem of the search over a huge space of possible models we propose an
extension of the ClustOfVar algorithm which was dedicated to subspaces of
dimension only 1, and which is similar in structure to the $K$-centroid
algorithm. We provide a complete methodology with theoretical guarantees,
extensive numerical experimentations, complete data analyses and
implementation. Our algorithm assigns variables to appropriate clusterse based
on the consistent Bayesian Information Criterion (BIC), and estimates the
dimensionality of each cluster by the PEnalized SEmi-integrated Likelihood
Criterion (PESEL), whose consistency we prove. Additionally, we prove that each
iteration of our algorithm leads to an increase of the Laplace approximation to
the model posterior probability and provide the criterion for the estimation of
the number of clusters. Numerical comparisons with other algorithms show that
VARCLUST may outperform some popular machine learning tools for sparse subspace
clustering. We also report the results of real data analysis including TCGA
breast cancer data and meteorological data. The proposed method is implemented
in the publicly available R package varclust. | Source: | arXiv, 2011.06501 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |