Okay, here's some sample code: ID = c(1,2,3,"A1",5,6,"A2",8,9,"A3") fakedata = rnorm(10, 5, .5) main.df = data.frame(ID,fakedata)
results for my data frame: > main.df ID fakedata 1 1 5.024332 2 2 4.752943 3 3 5.408618 4 A1 5.362838 5 5 5.158660 6 6 4.658235 7 A2 5.389601 8 8 4.998249 9 9 5.248517 10 A3 4.159490 sample1.df = main.df[sample(nrow(main.df), 4), ] > sample1.df ID fakedata 5 5 5.158660 9 9 5.248517 4 A1 5.362838 8 8 4.998249 Here's what happens when I put a comma before the variable ID: > sample2.df = main.df[sample(nrow(main.df[! main.df[,"ID"] %in% > sample1.df[,"ID"]]), 5),] Error in `[.data.frame`(main.df, !main.df[, "ID"] %in% sample1.df[, "ID"]) : undefined columns selected Here's what happens when I exclude the comma: sample2.df = main.df[sample(nrow(main.df[! main.df["ID"] %in% sample1.df["ID"]]), 5),] > sample2.df ID fakedata 8 8 4.998249 1 1 5.024332 3 3 5.408618 5 5 5.158660 10 A3 4.159490 As you can see, one way I get nothing other than an error, the other way I get a sample that doesn't exclude rows that were already included in the 1st sample. Thanks, Matt Dubins -- View this message in context: http://r.789695.n4.nabble.com/Random-sample-from-a-data-frame-where-ID-column-values-don-t-match-the-values-in-an-ID-column-in-a-se-tp4516448p4518878.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.