On Sat, May 30, 2009 at 11:59 AM, Stavros Macrakis <[email protected]>wrote:
> Since R is object-oriented, data frame set operations should be the natural
> operations for their class. There are, I suppose, two natural ways: the
> column-wise (variable-wise) and the row-wise (observation-wise) one. The
> row-wise one seems more natural and more useful to me.
> ...
>
> The row-wise interpretation makes sense in cases where observations with
> the same values for all variables can be considered redundant. That seems
> to me a much more useful interpretation. The union, intersection, and set
> difference of two sets of observations would seem to all be highly useful.
>
Another argument for the row-wise interpretation: the `subset` function
(also part of base) works that way on data frames.
Interestingly, %in%/match appears to work neither row-wise nor column-wise:
1 %in% data.frame(a=1:3) # FALSE (would be true if row-wise)
1:3 %in% data.frame(a=1:3) # FALSE FALSE FALSE (would be true if
column-wise)
but simply treats the data frame as a *character* list:
1 %in% data.frame(a=2,b=1) # TRUE
'1' %in% data.frame(a=2,b=1) # TRUE
1 %in% data.frame(a=2:3,b=1:2) # FALSE
1:3 %in% data.frame(a=2:4,b=1:3) # FALSE FALSE FALSE
'1:3' %in% data.frame(a=2:4,b=1:3) # TRUE
This specification is clearly documented in ? match, but I am mystified by
it. Perhaps someone from R core can shed light on the rationale?
-s
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel