>>>>> Anna F... >>>>> on Thu, 1 May 2014 22:09:28 +0000 writes:
> Hi Martin, > I am a statistician at National Jewish Health in Colorado, and I have been working on clustering a dataset using Ward's minimum variance. When plotting the dendrogram, the y-axis is labeled as 'height'. Can you explain to me (or point me in the right direction) on how this distance between merging clusters is calculated for the Ward method? I have found the calculation that SAS uses, and I want to check if it is the same in your method. > Here is a summary of the code I am using: > Agnes(x,method="ward",diss=TRUE) Well, as R is case sensitive, it must be agnes(x,method="ward",diss=TRUE) Interestingly, the new version of R, R 3.1.0 has now two different versions of Ward in hclust() : --> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html where it is stated that previously it was basically not using Ward's method unless the user was calling it in a specific way, but agnes() was and is. *The* reference for all basic routines in the 'cluster' package is Kaufman, L. and Rousseeuw, P.J. (1990). _Finding Groups in Data: An Introduction to Cluster Analysis_. Wiley, New York. Alternatively, the source code of R and all packages is open, and for the cluster package, you can either get it from cluster_*.tar.gz from CRAN, or also you can see the (subversion) development version at http://svn.r-project.org/ Specifically, the C code which computes agnes() is https://svn.r-project.org/R-packages/trunk/cluster/src/twins.c and there, case 4: /* 4: ward's method */ ta = (double) kwan[la]; tb = (double) kwan[lb]; tq = (double) kwan[lq]; fa = (ta + tq) / (ta + tb + tq); fb = (tb + tq) / (ta + tb + tq); fc = -tq / (ta + tb + tq); int nab = ind_2(la, lb); dys[naq] = sqrt(fa * dys[naq] * dys[naq] + fb * dys[nbq] * dys[nbq] + fc * dys[nab] * dys[nab]); break; contains the distance calculation for ward. ... [ in private communication with Anna, she agreed that I reply publicly to R-help such that others can chime in and all will be searchable for people with a similar question. MM ] Best regards, Martin Maechler ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.