I am trying to compare some word lists which have an associate set of numbers. I want to compare word list aa with bb and find only those words which are unique to bb, then compare bb with cc, etc.
I thought that I should be able to do this by using setdiff to get the unique words and then subset the data frame to get the unique names and corresponding numbers but I am misunderstanding something. When I run the code below a) I get lots of warning and b) I get the correct results for 4 of the 5 comparisons. However the comparison of three with four (cc,dd) gives me an empty subset. Can anyone point out my error or suggest a better way to do this? Thanks ============================================================================ mydata = data.frame(aa = Cs(cat, dog, horse, cow), bb = c("mouse", "dog", "cow", "pigeon"), cc =c("emu", "rat", "crow", "cow"), dd = c("cow", "camel", "manatee", "parrot") , ee = c( "coat", "hat", "dog", "camel") , ff = c("knife","dog", "cow", "pigeon"), ann = c(1,2,3,4), bnn = c(5,6,7,8), cnn = c(9,10,11,12), dnn = c(13,14,15,16), enn = c(17,18,19,20), fnn = c(21,22,23,24)) wordnames <- c("word", "number") word.list <- rep(vector("list", 1), 5) for(j in 1:5) { lone.word <- setdiff(mydata[,j+1],mydata[,j]); lone.word matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1]==lone.word); matching word.list[[j]] <- matching; names(word.list[[j]])<- wordnames } word.list ============================================================================= R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-2 gdata_2.3.1 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 lattice_0.17-1 R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.4-2 gdata_2.3.1 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.0 gtools_2.4.0 lattice_0.17-1 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.