Re: [R] conditional selection of dataframe rows

Marc Schwartz Thu, 12 Aug 2010 14:13:43 -0700

On Aug 12, 2010, at 3:06 PM, Toby Gass wrote:

> Thank you all for the quick responses.  So far as I've checked, 
> Marc's solution works perfectly and is quite speedy.  I'm still 
> trying to figure out what it is doing. :)
> 
> Henrique's solution seems to need some columns somewhere.  David's 
> solution does not find all the other measurements, possibly with 
> positive values, taken on the same day.
> 
> Thank you again for your efforts.
> 
> Toby


<snip>

Toby,

Working from the inside out:

The ave() function splits (sub-groups) the data frame by one or more factors, 
internally using split() and then passing the desired column from each 
sub-group to the function defined by using lapply(). By default, that is 
mean(). 

The great thing about using ave(), is that it will replicate the scalar 
sub-group based result of the function, once for each row in the sub-group. In 
addition, the result vector will be sorted in the order of the rows in the 
original data frame, rather than in the order of the sub-group rows. So in this 
case, if any of the rows in the sub-group has a SLOPE with negative value, all 
rows in the sub-group get a TRUE.


You can get an initial feel for the internal data organizing process by using:

> split(toy, list(toy$CH, toy$DAY))
$`3.4`
  CH DAY SLOPE
1  3   4   0.2
4  3   4   0.5

$`4.4`
  CH DAY SLOPE
2  4   4   0.3
5  4   4   0.6

$`5.4`
  CH DAY SLOPE
3  5   4   0.4

$`3.5`
  CH DAY SLOPE
7  3   5   0.1

$`4.5`
  CH DAY SLOPE
8  4   5     0

$`5.5`
  CH DAY SLOPE
6  5   5   0.2
9  5   5  -0.1



So the first step is:

> with(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
[1] 0 0 0 0 0 1 0 0 1


Note that I use with() to define that SLOPE, CH and DAY are all to be evaluated 
(found) within the 'toy' data frame. That is easier than using:

> ave(toy$SLOPE, toy$CH, toy$DAY, FUN = function(x) any(x < 0))
[1] 0 0 0 0 0 1 0 0 1


This returns a vector of 0's and 1's (FALSE and TRUE coerced to a numeric). 
Note that the returned vector does not correspond to the sequence of rows in 
the result of split() above, but to the sequence of rows in the original 'toy' 
data frame. That is, rows 6 and 9 are 1 (TRUE):

> cbind(toy, flag = with(toy, ave(SLOPE, CH, DAY, 
                                  FUN = function(x) any(x < 0))))
  CH DAY SLOPE flag
1  3   4   0.2    0
2  4   4   0.3    0
3  5   4   0.4    0
4  3   4   0.5    0
5  4   4   0.6    0
6  5   5   0.2    1
7  3   5   0.1    0
8  4   5   0.0    0
9  5   5  -0.1    1


The next step is to remove those rows. You could do that by using regular 
indexing, but by using subset(), I can replicate the behavior of having used 
with() above, since the arguments in subset() are evaluated within the data 
frame defined. Thus, I can eliminate the use of with() and have a shorter 
solution. Then, by negating the result of ave() so that 0 (FALSE) becomes TRUE, 
retain only those rows where the ave() result was 0:

> subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
  CH DAY SLOPE
1  3   4   0.2
2  4   4   0.3
3  5   4   0.4
4  3   4   0.5
5  4   4   0.6
7  3   5   0.1
8  4   5   0.0


I hope that clarifies the process.

Marc

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditional selection of dataframe rows

Reply via email to