On Wed, Jun 13, 2012 at 03:16:57AM -0700, sathya7priya wrote: > I have two data frames which has 3 columns each.My first data frame is large > like this below > "new.col ppm.p. freq.p." > "1_3_diaminopropane 3.13859 5.67516" > "1_3_diaminopropane 3.137 6.65388" > "1_3_diaminopropane 3.13541 8.0142" > "1_3_diaminopropane 3.13383 9.64184" > "1_3_diaminopropane 3.12075 298.243" > "1_3_diaminopropane 3.1152 44.6212" > "1_3_diaminopropane 3.10528 337.852" > "1_3_diaminopropane 3.09617 44.1467" > "1_3_diaminopropane 3.08943 308.2" > "1_3_diaminopropane 3.0807 7.47272" > "1_3_diaminopropane 3.07912 5.6996" [...] > "2_amino_5_ethyl_1_3_4_thiadiazole 1.15306 116.661" > "2_amino_5_ethyl_1_3_4_thiadiazole 1.14513 64.8014" > "2_amino_5_ethyl_1_3_4_thiadiazole 1.13681 45.9263" > "2_amino_5_ethyl_1_3_4_thiadiazole 1.12848 35.0817" > "2_amino_5_ethyl_1_3_4_thiadiazole 0.000156828 127.55" > > > And my second dataframe is like query which has limited rows > "new.col ppm.p. freq.p." > "unknown" 7.44687 7.1684 > "unknown" 4.81412 105.11 > I want to compare the second and third columns of both dataframe and see > whether there are any identical values in them. > My expected answer is that the second dataframe is similar to values of > 1_amino_1_phenylmethyl_phosphonic_acidpeak in data frame 1.
Hi. If you look for similar and not identical values, then it is possible to specify a tolerance and use sum of squares distance. Since the second data frame is not large, a loop may be used. For example # some data base <- data.frame(x1=letters[1:5], x2=seq(1, 2, length=5), x3=seq(1.2, 1.8, length=5)) observed <- data.frame(x1=letters[6:8], x2=c(1.4, 1.01, 1.27), x3=c(1.6, 1.21, 1.37)) # choose tolerance eps <- 0.05 # inspect data mat1 <- as.matrix(base[, 2:3]) mat2 <- as.matrix(observed[, 2:3]) for (i in seq.int(length=nrow(observed))) { j <- which(rowSums(sweep(mat1, MARGIN=2, mat2[i, ])^2) <= eps^2) if (length(j) >= 1) cat("row", i, "is similar to row:", j, "\n") } Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.