Dear members. I am having problems to understand the kmeans- results in R. I am applying kmeans-algorithms to my big data file, and it is producing the results of the clusters.
Q1) Does anybody knows how to find out in which cluster (I have fixed numberofclusters = 5 ) which data have been used? COMMAND (kmeans.results <- kmeans(mydata,centers =5, iter.max= 1000, nstart =10000)) Q2) When I call kmeans.results I have the following output: K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32 Cluster means: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 1 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0008235294 2 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0000000000 3 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0000000000 4 0 0 0 0 0 0 0 0 0 0 0.0000000 0.0040000000 5 0 0 0 0 0 0 0 0 0 0 0.0003125 0.0003750000 [,13] [,14] [,15] [,16] [,17] [,18] 1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652 0.10663935 2 0.0000000000 0.000000000 0.000000000 0.000000000 0.169491525 0.61016949 3 0.0000000000 0.000000000 0.000000000 0.002333333 0.006666667 0.07695015 4 0.0030000000 0.001500000 0.001000000 0.017500000 0.029000000 0.06150000 5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500 0.14306250 [,19] [,20] [,21] [,22] [,23] [,24] [,25] 1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212 0.006000000 2 0.94915254 0.1694915 0.1016949 0.0000000 0.00000000 0.00000000 0.000000000 3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362 0.007666667 4 0.17100000 0.6665000 0.7860000 0.1860000 0.04650000 0.01450000 0.012000000 5 0.18100000 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750 0.011500000 [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] 1 0.0005882353 0.001176471 0 0 0 0 0 0 0 0 2 0.0000000000 0.000000000 0 0 0 0 0 0 0 0 3 0.0010000000 0.000000000 0 0 0 0 0 0 0 0 4 0.0000000000 0.000000000 0 0 0 0 0 0 0 0 5 0.0013125000 0.000000000 0 0 0 0 0 0 0 0 [,36] [,37] [,38] [,39] [,40] 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 Clustering vector: [1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5 5 1 5 5 5 1 [39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1 Within cluster sum of squares by cluster: [1] 0.6702803 0.0000000 0.2453294 0.1860180 1.3535263 (between_SS / total_SS = 76.8 %) Available components: [1] "cluster" "centers" "totss" "withinss" "tot.withinss" [6] "betweenss" "size" > Q3)I would like to understand which raw data are in which cluster ? Does somebody knows how to access the table of raw data which are in the same cluster ? Thanks for help DZU -- View this message in context: http://r.789695.n4.nabble.com/K-means-results-understanding-tp4670171.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.