Re: [R] Hierarchical Cluster Analysis with large dataset

Ranjan Maitra Sun, 03 Nov 2013 06:05:28 -0800

On Sun, 3 Nov 2013 10:42:06 +0100 Petar Milin
<petar.mi...@uni-tuebingen.de> wrote:


> Hello!
> Can anyone give me advice on running Hierarchical Cluster Analysis on large
> datasets? For example, 80000x10000. Calculating distances on such a
> dataframe seems impossible even on very powerful computer.
> 
> Also, any other advice that would lead to reduction of dimensionality,
> i.e., cluster/group variables would be more than welcomed.

You have two different issues here: size of dataset (number of
observations which prevents storage in memory of the distance matrix)
and number of variables (which does not, but probably hinders reading
in the dataset.

You need to provide more information here: why do you need/want to do
hierarchical clustering, if so, do you only need to use R. What
hardware you have at your disposal, etc.

Depending on your answers to the above, this may well be a research
problem in its own right.

HTH!

Best wishes,
Ranjan

> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Important Notice: This mailbox is ignored: e-mails are set to be
deleted on receipt. Please respond to the mailing list if appropriate.
For those needing to send personal or professional e-mail, please use
appropriate addresses.

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your 
desktop!

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hierarchical Cluster Analysis with large dataset

Reply via email to