This is interesting and does seem suboptimal. Especially because if I start with a matrix from the beginning, it behaves as expected.
> d<-data.frame(d1 = letters[1:3], + d2 = c("1","2","3"), + d3 = c(NA,NA,"6")) > > str(d) 'data.frame': 3 obs. of 3 variables: $ d1: chr "a" "b" "c" $ d2: chr "1" "2" "3" $ d3: chr NA NA "6" > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) d1 d2 d3 FALSE TRUE FALSE -----Original Message----- From: Jiefei Wang <szwj...@gmail.com> Sent: Friday, October 8, 2021 2:22 PM To: Derickson, Ryan, VHA NCOD <ryan.derick...@va.gov> Cc: r-help@r-project.org Subject: [EXTERNAL] Re: [R] unexpected behavior in apply Ok, it turns out that this is documented, even though it looks surprising. First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is > as.matrix.data.frame(d) d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character > format(c(NA, 6)) [1] "NA" " 6" That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want. > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) $d1 [1] FALSE $d2 [1] TRUE $d3 [1] FALSE Everything should work as you expect. Best, Jiefei On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwj...@gmail.com> wrote: > > Hi, > > I guess this can tell you what happens behind the scene > > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > apply(d, 2, FUN=function(x)x) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > "a"<=3 > [1] FALSE > > "2"<=3 > [1] TRUE > > "6"<=3 > [1] FALSE > > Note that there is an additional space in the character value " 6", > that's why your comparison fails. I do not understand why but this > might be a bug in R > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > <r-help@r-project.org> wrote: > > > > Hello, > > > > I'm seeing unexpected behavior when using apply() compared to a for loop > > when a character vector is part of the data subjected to the apply > > statement. Below, I check whether all non-missing values are <= 3. If I > > include a character column, apply incorrectly returns TRUE for d3. If I > > only pass the numeric columns to apply, it is correct for d3. If I use a > > for loop, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 NA > > 2 b 2 NA > > 3 c 3 6 > > > > > > # results are incorrect > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE TRUE > > > > > > # results are correct > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d2 d3 > > TRUE FALSE > > > > > > # results are correct > > > for(i in names(d)){ > > + print(all(d[!is.na(d[,i]),i] <= 3)) } > > [1] FALSE > > [1] TRUE > > [1] FALSE > > > > > > Finally, if I remove the NA values from d3 and include the character column > > in apply, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(4,5,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 4 > > 2 b 2 5 > > 3 c 3 6 > > > > > > # results are correct > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE FALSE > > > > > > Can someone help me understand what's happening? > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7C%7Cd4c50 > > d8f8da547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7 > > C0%7C637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3KAp > > Y5pdxAh5BzVZvjyrQKTpqkigQmW8N7pmU7DQGcU%3D&reserved=0 > > PLEASE do read the posting guide > > https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww > > .r-project.org%2Fposting-guide.html&data=04%7C01%7C%7Cd4c50d8f8d > > a547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7C0%7C > > 637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mgrquTpZU > > SQt7cGywiHtaKWrdqAjvaG4gFx9aD7nRlA%3D&reserved=0 > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.