Hello,
I'm seeing unexpected behavior when using apply() compared to a for loop when a
character vector is part of the data subjected to the apply statement. Below, I
check whether all non-missing values are <= 3. If I include a character column,
apply incorrectly returns TRUE for d3. If I only pass the numeric columns to
apply, it is correct for d3. If I use a for loop, it is correct.
> d<-data.frame(d1 = letters[1:3],
+ d2 = c(1,2,3),
+ d3 = c(NA,NA,6))
>
> d
d1 d2 d3
1 a 1 NA
2 b 2 NA
3 c 3 6
>
> # results are incorrect
> apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
d1 d2 d3
FALSE TRUE TRUE
>
> # results are correct
> apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
d2 d3
TRUE FALSE
>
> # results are correct
> for(i in names(d)){
+ print(all(d[!is.na(d[,i]),i] <= 3))
+ }
[1] FALSE
[1] TRUE
[1] FALSE
Finally, if I remove the NA values from d3 and include the character column in
apply, it is correct.
> d<-data.frame(d1 = letters[1:3],
+ d2 = c(1,2,3),
+ d3 = c(4,5,6))
>
> d
d1 d2 d3
1 a 1 4
2 b 2 5
3 c 3 6
>
> # results are correct
> apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
d1 d2 d3
FALSE TRUE FALSE
Can someone help me understand what's happening?
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.