So you just want to compare the distances from each point of your new
data to each of the Centres and assign the corresponding number of the
centre as in:
clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2))))
but since the apply loop is rather long here for lots of new data, one
may want to optimize the runtime for huge data and get:
tNewData <- t(NewData)
clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))
Best,
Uwe Ligges
On 21.05.2013 13:19, HJ YAN wrote:
Dear R users
I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.
I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1 than
lable it 'cluster 1'.
If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
new data 'NewData' has 500 observations, by using kmeans() will update the
centres:
kmeans(NewData, Centre)
I wondered if there is other R packages out there can keep the centres
fixed and lable each observations of my new data? Or I have to write my own
function?
To illustrate my task using a simpler example:
I have
Centre<- matrix(c(0,1,0,1), nrow=2)
# the two created centres in a two dimentional case are
Centre
[,1] [,2]
[1,] 0 0
[2,] 1 1
NewData<-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
NewData1<-cbind(c1:100), NewData)
colnames(NewData1)<-c("ID","x","y")
# my data
head(NewData1)
ID x y
[1,] 1 -0.3974660 0.1541685
[2,] 2 0.5321347 0.2497867
[3,] 3 0.2550276 0.1691720
[4,] 4 -0.1162162 0.6754874
[5,] 5 0.1570996 0.1175119
[6,] 6 0.4816195 -0.6836226
## I'd like to have outcome as below (whilst keep the tow centers fixed):
ID x y Cluster
[1,] 1 -0.3974660 0.1541685 1
[2,] 2 0.5321347 0.2497867 1
[3,] 3 0.2550276 0.1691720 1
[4,] 4 -0.1162162 0.6754874 1
...
[55,] 55 1.1570996 1.1175119 2
[56,] 56 1.4816195 1.6836226 2
p.s. I use Euclidian to obtain/calculate distance matrix.
Many thanks in advance
HJ
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.