Hi all, I have a question regarding subsetting of large data frames. I have two data frames “catches” and “tows” and they both have the same 30 variables (columns). I would like to select rows in the data frame “tows” where all 5 specific variables are NOT matched in “catches. That is to say, the combination of these 5 variables is unique. One or more of the variables could be the same but the combination would be unique. This is confusing to explain so here is a short example to explain what I am trying to explain:
Example data catches: Row Cruise Order Townumber Towtype Ship Netlocation Var1 Var2 1 22 1 4 A B S X1 X2 2 22 1 4 A B S X1 X2 3 22 1 4 BL AM S X1 X2 4 22 1 4 BL AM S X1 X2 5 260 1 4 BL B S X1 X2 6 260 1 4 BL B S X1 X2 Example data tows: Row Cruise Order Townumber Towtype Ship Netlocation Var1 Var2 1 22 1 4 A B S X1 X2 2 400 1 4 BL AM S X1 X2 3 260 1 4 BL B S X1 X2 4 260 10 10 BL B S X1 X2 5 22 99 4 BL B S X1 X2 I would want to select rows 2, 4, and 5 from “tows” due to the fact that the same collection of “cruise”, ”order”, ”townumber”, ”towtype”, ”ship”, and ”netlocation” are not found in “catches”. All rows in data set “tows” are unique. Clear as mud? Sorry I couldn’t provide real data, but these datasets are quite large. So far I have tried: New<-tows[(tows$cruise != catches$cruise) & (tows$order != catches$order) & (tows$townumber != catches$townumber) & (tows$towtype != catches$towtype) & (tows$ship != catches$ship) & (tows$netlocation != catches$netlocation),] But this didn’t work. Thanks for your time and help (in advance). Dan. -- View this message in context: http://www.nabble.com/subsetting-large-data-frames.-tp20883217p20883217.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.