Re: [R] Generating Hotelling's T squared statistic with hclust

David L Carlson Fri, 08 Apr 2016 11:15:07 -0700

As Burt pointed out, your plan is not advisable (that is putting it 
diplomatically) and not about R, but we can use R to show you why it is not 
advisable. What you are doing is inherently circular. You use the data to 
create groups and then you test the groups against the data you used to create 
them. The null hypothesis in Hotelling's T is that the groups are completely 
independent of the data.


> set.seed(42)
> x <- matrix(rnorm(25*4), 25, 4)
> x.hcl <- hclust(dist(x), method="ward.D2")
> plot(x.hcl)

Now you have a dendrogram showing three nice looking clusters that are based on 
completely random numbers. Unless the pseudo random number function is flawed, 
there is no structure in these data, but the dendrogram looks plausible. We 
need 2 groups for Hotelling's T:

> grps <- cutree(x.hcl, 2)
> library(DescTools)
> HotellingsT2Test(x~grps)

        Hotelling's two sample T2-test

data:  x by grps
T.2 = 8.3476, df1 = 4, df2 = 20, p-value = 0.0003947
alternative hypothesis: true location difference is not equal to c(0,0,0,0)

No surprise. There is a significant difference between the groups. That just 
tells us the hclust() is working properly. It tells us exactly nothing about 
any structure or pattern in the data (there is none). An equally bad (but 
surprisingly common) approach is to use linear discriminant analysis. Here we 
will use 3 groups:

> grps <- cutree(x.hcl, 3)
> library(MASS)
> x.lda <- lda(x, grps)
> x.pre <- predict(x.lda)
> plot(x.lda)
> for (i in 1:3) { segments(centers[i, 2], centers[i, 3], 
+      x.pre$x[grps==i, 1], x.pre$x[grps==i, 2], lty=2)
+ }

Now we have 3 well-separated clusters created from completely random data. 
Hierarchical clustering always creates clusters. It does not question the data 
you provide and it does not stop and refuse to continue if there are no 
clusters in the data.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
Sent: Friday, April 8, 2016 8:55 AM
To: r-help@r-project.org
Subject: [R] Generating Hotelling's T squared statistic with hclust

I am doing a cluster analysis with hclust.  I want to get hclust to output the 
Hotelling's T squared statistic for each cluster so I can evaluate is data 
points should be in a cluster or not.  My research to answer this question has 
been unsuccessful.  Does anyone know how to get hclust to output the 
Hotelling's T squared statistic for each cluster?


Mike



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating Hotelling's T squared statistic with hclust

Reply via email to