Okay thanks to your help I figured it out and stuck the code in a function:
df.sample.exIDs = function(main.df, sample1.df, n, ID1.name, ID2.name) {
main.ID1.notin.ID2 = main.df[!main.df[,ID1.name] %in%
sample1.df[,ID2.name],]
sample2.df = main.ID1.notin.ID2[sample(nrow(main.ID1.notin.ID2), si
Okay, here's some sample code:
ID = c(1,2,3,"A1",5,6,"A2",8,9,"A3")
fakedata = rnorm(10, 5, .5)
main.df = data.frame(ID,fakedata)
results for my data frame:
> main.df
ID fakedata
1 1 5.024332
2 2 4.752943
3 3 5.408618
4 A1 5.362838
5 55.158660
6 64.658235
7
When I use that exact syntax (with the ID variable names in quotes within the
square brackets after a comma) it just doesn't work. Also, I'm looking for
a random sample, not all possible rows with ID values that don't match the
second data frame.
--
View this message in context:
http://r.789695.
Hello,
Let's say I've drawn a random sample (sample1.df) from a large data frame
(main.df), and I want to create a second random sample (sample2.df) where
the values in its ID column *are not* in the equivalent ID column in the
first sample (sample1.df). How would I go about doing this?
In other
4 matches
Mail list logo