On 11-09-13 5:17 PM, Timothy Bates wrote:
Dear Duncan and Hadley,
I stumbled across the NA behavior of subset a little while ago and thought it
might do the trick. But my common usage case is not getting a subsetting sans
NAs, but setting values in the whole dataframe.
So I need T/F at each row, not just the list of rows that match the subset of
matching cases...
How would you do this with subset?
data[data$YOB< 1908& !is.na(data$YOB), "Age"]=NA
Unlike Hadley, I didn't mean to use the subset() function, I was just
talking about computing the subset first, and doing the rest later. So
you would write that as something like
complete <- !is.na(data$YOB)
data[complete & data$YOB < 1908, "Age"] <- NA
Of course, this isn't really necessary when you're only checking one
variable, but completeness tests are often more complicated.
More below...
My %<% idea extends the vocabulary established by %in%, and works in the same
grammatical situation.
here's a real example
# Fix missing T2 sex for same sex pairs...
twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for
people under 12, not inc NAs
Addressing Duncan's point about returning a logical array... the %<% function
should be:
"%<%"<- function(table, x){
lessThan = table< x
lessThan[is.na(lessThan)] = FALSE
return(lessThan)
}
I think that still doesn't work quite right. You want the conversion of
NA to FALSE to happen as the last part of evaluating an expression, not
in intermediate steps. Otherwise
!(a %<% 10)
will give TRUE for NA values, which may not be as intended, if your
intention was to skip NA cases.
Duncan Murdoch
This also works for matrices as it should
x = matrix(c(1:10,NA,12:20),nrow=2)
x %<% 6
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:
Because in coding, I often end up with big chunks looking like this:
((mydataframeName$myvariableName> 2& !is.na(mydataframeName$myvariableName))&
(mydataframeName$myotherVariableName == "male"&
!is.na(mydataframeName$myotherVariableName)))
Which is much less readable/maintainable/editable than
mydataframeName$myvariableName> 2& mydataframeName$myotherVariableName ==
"male"
Use subset:
subset(mydataframeName, myvariableName> 2& myotherVariableName == "male")
(subset automatically treats NAs as false)
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.