As Burt pointed out, your plan is not advisable (that is putting it
diplomatically) and not about R, but we can use R to show you why it is not
advisable. What you are doing is inherently circular. You use the data to
create groups and then you test the groups against the data you used to create
them. The null hypothesis in Hotelling's T is that the groups are completely
independent of the data.
> set.seed(42)
> x <- matrix(rnorm(25*4), 25, 4)
> x.hcl <- hclust(dist(x), method="ward.D2")
> plot(x.hcl)
Now you have a dendrogram showing three nice looking clusters that are based on
completely random numbers. Unless the pseudo random number function is flawed,
there is no structure in these data, but the dendrogram looks plausible. We
need 2 groups for Hotelling's T:
> grps <- cutree(x.hcl, 2)
> library(DescTools)
> HotellingsT2Test(x~grps)
Hotelling's two sample T2-test
data: x by grps
T.2 = 8.3476, df1 = 4, df2 = 20, p-value = 0.0003947
alternative hypothesis: true location difference is not equal to c(0,0,0,0)
No surprise. There is a significant difference between the groups. That just
tells us the hclust() is working properly. It tells us exactly nothing about
any structure or pattern in the data (there is none). An equally bad (but
surprisingly common) approach is to use linear discriminant analysis. Here we
will use 3 groups:
> grps <- cutree(x.hcl, 3)
> library(MASS)
> x.lda <- lda(x, grps)
> x.pre <- predict(x.lda)
> plot(x.lda)
> for (i in 1:3) { segments(centers[i, 2], centers[i, 3],
+ x.pre$x[grps==i, 1], x.pre$x[grps==i, 2], lty=2)
+ }
Now we have 3 well-separated clusters created from completely random data.
Hierarchical clustering always creates clusters. It does not question the data
you provide and it does not stop and refuse to continue if there are no
clusters in the data.
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:[email protected]] On Behalf Of Michael
Sent: Friday, April 8, 2016 8:55 AM
To: [email protected]
Subject: [R] Generating Hotelling's T squared statistic with hclust
I am doing a cluster analysis with hclust. I want to get hclust to output the
Hotelling's T squared statistic for each cluster so I can evaluate is data
points should be in a cluster or not. My research to answer this question has
been unsuccessful. Does anyone know how to get hclust to output the
Hotelling's T squared statistic for each cluster?
Mike
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.