On Thu, Mar 31, 2011 at 11:48 AM, Hans Ekbrand <h...@sociologi.cjb.net> wrote: > > The variables are unordered factors, stored as integers 1:9, where > > 1 means "Full-time employment" > 2 means "Part-time employment" > 3 means "Student" > 4 means "Full-time self-employee" > ... > > Does euclidean distances make sense on unordered factors coded as > integers?
It probably doesn't. You said you have some 36 observations for each case, correct? You can turn these 36 observations into a vector of length 36 * 9 on which Euclidean distance will make some sense, namely k changes will produce a distance of sqrt(2*k). For each observation with value p (p between 1 and 9), create a vector r = c(0,0,1,0,...0) where the entry 1 is in the p-th component. Hence, if values p1 and p2 are the same, euclidean distance between r1 and r2 is zero; if they are not the same, Euclidan distance is sqrt(2). Here's some possible R code: transform = function(obsVector, maxVal) { templateMat = matrix(0, maxVal, maxVal); diag(templateMat) = 1; return(as.vector(templateMat[, obsVector])); } set.seed(10) n = 4; m = 5; max = 4; data = matrix(sample(c(1:max), n*m, replace = TRUE), m, n); > data [,1] [,2] [,3] [,4] [1,] 3 3 1 2 [2,] 1 3 3 2 [3,] 3 3 2 4 [4,] 1 2 4 2 [5,] 4 1 4 1 trafoData = apply(data, 2, transform, maxVal = max); > trafoData [,1] [,2] [,3] [,4] [1,] 0 0 1 0 [2,] 0 0 0 1 [3,] 1 1 0 0 [4,] 0 0 0 0 [5,] 1 0 0 0 [6,] 0 0 0 1 [7,] 0 1 1 0 [8,] 0 0 0 0 [9,] 0 0 0 0 [10,] 0 0 1 0 [11,] 1 1 0 0 [12,] 0 0 0 1 [13,] 1 0 0 0 [14,] 0 1 0 1 [15,] 0 0 0 0 [16,] 0 0 1 0 [17,] 0 1 0 1 [18,] 0 0 0 0 [19,] 0 0 0 0 [20,] 1 0 1 0 The code assumes that cases are in columns and observations in rows of data. Examine data and trafoData to see how the transformation works. Once you have the transformed data, simply apply your favorite clustering method that uses Euclidean distance. HTH, Peter > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.