Re: [R] x %>% y as an alternative to which( x > y)

Duncan Murdoch Tue, 13 Sep 2011 14:32:31 -0700

On 11-09-13 5:17 PM, Timothy Bates wrote:

Dear Duncan and Hadley,


I stumbled across the NA behavior of subset a little while ago and thought it 
might do the trick. But my common usage case is not getting a subsetting sans 
NAs, but setting values in the whole dataframe.

So I need T/F at each row, not just the list of rows that match the subset of 
matching cases...

How would you do this with subset?

    data[data$YOB<  1908&  !is.na(data$YOB), "Age"]=NA

Unlike Hadley, I didn't mean to use the subset() function, I was justtalking about computing the subset first, and doing the rest later. Soyou would write that as something like


complete <- !is.na(data$YOB)
data[complete & data$YOB < 1908, "Age"] <- NA

Of course, this isn't really necessary when you're only checking onevariable, but completeness tests are often more complicated.


More below...

My %<% idea extends the vocabulary established by %in%, and works in the same 
grammatical situation.

here's a real example

# Fix missing T2 sex for same sex pairs...

twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for 
people under 12, not inc NAs

Addressing Duncan's point about returning a logical array... the %<% function 
should be:

"%<%"<- function(table, x){
        lessThan = table<  x
        lessThan[is.na(lessThan)] = FALSE
        return(lessThan)
}

I think that still doesn't work quite right. You want the conversion ofNA to FALSE to happen as the last part of evaluating an expression, notin intermediate steps. Otherwise


!(a %<% 10)

will give TRUE for NA values, which may not be as intended, if yourintention was to skip NA cases.


Duncan Murdoch

This also works for matrices as it should

x = matrix(c(1:10,NA,12:20),nrow=2)
x %<% 6

      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] TRUE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:

Because in coding, I often end up with big chunks looking like this:

((mydataframeName$myvariableName>  2&  !is.na(mydataframeName$myvariableName))&  
(mydataframeName$myotherVariableName == "male"&  
!is.na(mydataframeName$myotherVariableName)))

Which is much less readable/maintainable/editable than

mydataframeName$myvariableName>  2&  mydataframeName$myotherVariableName == 
"male"

Use subset:

subset(mydataframeName, myvariableName>  2&  myotherVariableName == "male")

(subset automatically treats NAs as false)

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] x %>% y as an alternative to which( x > y)

Reply via email to