Ok, it turns out that this is documented, even though it looks surprising.

First of all, the apply function will try to convert any object with
the dim attribute to a matrix(my intuition agrees with you that there
should be no conversion), so the first step of the apply function is

> as.matrix.data.frame(d)
     d1  d2  d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"

Since the data frame `d` is a mixture of character and non-character
values, the non-character value will be converted to the character
using the function `format`. However, the problem is that the NA value
will also be formatted to the character

> format(c(NA, 6))
[1] "NA" " 6"

That's where the space comes from. It is purely for making the result
pretty... The character NA will be removed later, but the space is not
stripped. I would say this is not a good design, and it might be worth
not including the NA value in the format function. At the current
stage, I will suggest using the function `lapply` to do what you want.

> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
$d1
[1] FALSE
$d2
[1] TRUE
$d3
[1] FALSE

Everything should work as you expect.

Best,
Jiefei

On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwj...@gmail.com> wrote:
>
> Hi,
>
> I guess this can tell you what happens behind the scene
>
>
> > d<-data.frame(d1 = letters[1:3],
> +               d2 = c(1,2,3),
> +               d3 = c(NA,NA,6))
> > apply(d, 2, FUN=function(x)x)
>      d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> > "a"<=3
> [1] FALSE
> > "2"<=3
> [1] TRUE
> > "6"<=3
> [1] FALSE
>
> Note that there is an additional space in the character value " 6",
> that's why your comparison fails. I do not understand why but this
> might be a bug in R
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> <r-help@r-project.org> wrote:
> >
> > Hello,
> >
> > I'm seeing unexpected behavior when using apply() compared to a for loop 
> > when a character vector is part of the data subjected to the apply 
> > statement. Below, I check whether all non-missing values are <= 3. If I 
> > include a character column, apply incorrectly returns TRUE for d3. If I 
> > only pass the numeric columns to apply, it is correct for d3. If I use a 
> > for loop, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(NA,NA,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1 NA
> > 2  b  2 NA
> > 3  c  3  6
> > >
> > > # results are incorrect
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE  TRUE
> > >
> > > # results are correct
> > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d2    d3
> >  TRUE FALSE
> > >
> > > # results are correct
> > > for(i in names(d)){
> > +   print(all(d[!is.na(d[,i]),i] <= 3))
> > + }
> > [1] FALSE
> > [1] TRUE
> > [1] FALSE
> >
> >
> > Finally, if I remove the NA values from d3 and include the character column 
> > in apply, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(4,5,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1  4
> > 2  b  2  5
> > 3  c  3  6
> > >
> > > # results are correct
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE FALSE
> >
> >
> > Can someone help me understand what's happening?
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to