On 11-09-13 5:17 PM, Timothy Bates wrote:
Dear Duncan and Hadley,

I stumbled across the NA behavior of subset a little while ago and thought it 
might do the trick. But my common usage case is not getting a subsetting sans 
NAs, but setting values in the whole dataframe.

So I need T/F at each row, not just the list of rows that match the subset of 
matching cases...

How would you do this with subset?

    data[data$YOB<  1908&  !is.na(data$YOB), "Age"]=NA

Unlike Hadley, I didn't mean to use the subset() function, I was just talking about computing the subset first, and doing the rest later. So you would write that as something like

complete <- !is.na(data$YOB)
data[complete & data$YOB < 1908, "Age"] <- NA

Of course, this isn't really necessary when you're only checking one variable, but completeness tests are often more complicated.

More below...
My %<% idea extends the vocabulary established by %in%, and works in the same 
grammatical situation.

here's a real example

# Fix missing T2 sex for same sex pairs...

twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for 
people under 12, not inc NAs

Addressing Duncan's point about returning a logical array... the %<% function 
should be:

"%<%"<- function(table, x){
        lessThan = table<  x
        lessThan[is.na(lessThan)] = FALSE
        return(lessThan)
}

I think that still doesn't work quite right. You want the conversion of NA to FALSE to happen as the last part of evaluating an expression, not in intermediate steps. Otherwise

!(a %<% 10)

will give TRUE for NA values, which may not be as intended, if your intention was to skip NA cases.

Duncan Murdoch

This also works for matrices as it should

x = matrix(c(1:10,NA,12:20),nrow=2)
x %<% 6
      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] TRUE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:

Because in coding, I often end up with big chunks looking like this:

((mydataframeName$myvariableName>  2&  !is.na(mydataframeName$myvariableName))&  
(mydataframeName$myotherVariableName == "male"&  
!is.na(mydataframeName$myotherVariableName)))

Which is much less readable/maintainable/editable than

mydataframeName$myvariableName>  2&  mydataframeName$myotherVariableName == 
"male"
Use subset:

subset(mydataframeName, myvariableName>  2&  myotherVariableName == "male")

(subset automatically treats NAs as false)

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to