[R] subsetting large data frames.

hesicaia Sun, 07 Dec 2008 10:01:26 -0800

Hi all,
  I have a question regarding subsetting of large data frames. I have two
data frames “catches” and “tows” and they both have the same 30 variables
(columns). I would like to select rows in the data frame “tows” where all 5
specific variables are NOT matched in “catches. That is to say, the
combination of these 5 variables is unique. One or more of the variables
could be the same but the combination would be unique. This is confusing to
explain so here is a short example to explain what I am trying to explain:


Example data catches:

Row     Cruise  Order   Townumber       Towtype Ship    Netlocation     Var1    
Var2
1        22     1                      4                      A    B            
S       X1      X2
2        22             1                      4                      A    B    
        S       X1      X2
3        22             1                      4                      BL        
   AM   S       X1      X2
4        22             1                      4                      BL        
   AM   S       X1      X2
5        260            1                      4                      BL        
    B   S       X1      X2
6        260            1                      4                      BL        
    B   S       X1
X2
 
Example data tows:

Row     Cruise  Order   Townumber       Towtype Ship    Netlocation     Var1    
Var2
1       22      1                       4               A       B       S       
X1      X2
2       400     1                       4               BL              AM      
S       X1      X2
3       260     1                       4               BL      B       S       
X1      X2
4       260     10              10      BL      B       S       X1      X2
5       22      99              4               BL      B       S       X1      
X2

I would want to select rows 2, 4, and 5 from “tows” due to the fact that the
same collection of “cruise”, ”order”, ”townumber”, ”towtype”, ”ship”, and
”netlocation” are not found in “catches”. All rows in data set “tows” are
unique. Clear as mud? Sorry I couldn’t provide real data, but these datasets
are quite large. 

So far I have tried:

New<-tows[(tows$cruise != catches$cruise) & (tows$order != catches$order) &
(tows$townumber !=  catches$townumber) & (tows$towtype != catches$towtype) &
(tows$ship != catches$ship) & (tows$netlocation != catches$netlocation),]
 
But this didn’t work. 
Thanks for your time and help (in advance).
Dan.


-- 
View this message in context: 
http://www.nabble.com/subsetting-large-data-frames.-tp20883217p20883217.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subsetting large data frames.

Reply via email to