Dear all, I need to process large amounts of data (two or three variables for 6,000 cases) cluster analysis. In the end I need to fill the source data to the obtained clusters. I need to trace the sequence of data fusion. In this case, I can fill in a cluster (with any level of linkage distance) by data. This procedure is implemented in the package Statistica, but this package can not work with large amounts of data. In an attachment, I give an example for small sample sizes. Figure this is a tree of clusters, and a text file and Excel file is "Amalgamation Schedule" (Jointing matrix) http://r.789695.n4.nabble.com/file/n4319741/Tree_Diagram_for_61_Cases.jpg http://r.789695.n4.nabble.com/file/n4319741/Amalgamation_Schedule_(test).txt Amalgamation_Schedule_(test).txt http://r.789695.n4.nabble.com/file/n4319741/Amalgamation_Schedule.xls Amalgamation_Schedule.xls My code: x <- read.table('test.csv', sep=',', header=TRUE) x <- x[-1] d <- dist(x, method = "ward”, diag = FALSE, upper = FALSE, p=2) hc <- hclust(d) plot(hc)
Greatly sorry for my English. thank you -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-Amalgamation-Schedule-help-tp4319741p4319741.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.