Ok, it turns out that this is documented, even though it looks surprising. First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is
> as.matrix.data.frame(d) d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character > format(c(NA, 6)) [1] "NA" " 6" That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want. > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) $d1 [1] FALSE $d2 [1] TRUE $d3 [1] FALSE Everything should work as you expect. Best, Jiefei On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwj...@gmail.com> wrote: > > Hi, > > I guess this can tell you what happens behind the scene > > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > apply(d, 2, FUN=function(x)x) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > "a"<=3 > [1] FALSE > > "2"<=3 > [1] TRUE > > "6"<=3 > [1] FALSE > > Note that there is an additional space in the character value " 6", > that's why your comparison fails. I do not understand why but this > might be a bug in R > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > <r-help@r-project.org> wrote: > > > > Hello, > > > > I'm seeing unexpected behavior when using apply() compared to a for loop > > when a character vector is part of the data subjected to the apply > > statement. Below, I check whether all non-missing values are <= 3. If I > > include a character column, apply incorrectly returns TRUE for d3. If I > > only pass the numeric columns to apply, it is correct for d3. If I use a > > for loop, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 NA > > 2 b 2 NA > > 3 c 3 6 > > > > > > # results are incorrect > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE TRUE > > > > > > # results are correct > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d2 d3 > > TRUE FALSE > > > > > > # results are correct > > > for(i in names(d)){ > > + print(all(d[!is.na(d[,i]),i] <= 3)) > > + } > > [1] FALSE > > [1] TRUE > > [1] FALSE > > > > > > Finally, if I remove the NA values from d3 and include the character column > > in apply, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(4,5,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 4 > > 2 b 2 5 > > 3 c 3 6 > > > > > > # results are correct > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE FALSE > > > > > > Can someone help me understand what's happening? > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.